Update on the Statistical Analysis of Traffic Countings on Two-Lane Rural Highways

This paper deals with the problem of model identification, calibration and validation for traffic countings on two-lane rural highways. A criterion for preliminary selection of arrival laws as a function of appropriate sample statistics and a technique for deciding whether sample data sets of traffic counting are congruent with stationary time series behavior are suggested; besides arrival laws currently used in research and engineering practice, the Neyman distribution has been also applied although it is not frequently implemented in the field of traffic engineering. Moreover, this work aims at applying these methods to a set of empirical data derived from a recent survey on two two-lane rural highways; the arrival laws that best agree with the observations are found and the relations between the parameters identifying the arrival laws and the flow rates are worked out. Finally, the results have been compared to those achieved in similar observations, carried out by one of the authors in the past.


Introduction
Traffic counting represents the first and, to this day, foremost empirical measurement designed to research on vehicular flows; counting distributions have direct relevance to discrete-time point processes applied to road traffic (Bertò, Schoen, & Speranza, 1996).Statistical models for traffic counting, also known as probabilistic arrival laws, and vehicular headway distributions (Mauro & Branco, 2012;Ha, Aron, & Cohen, 2012) are used in road and highway engineering, i.e. in traffic simulation procedures, in vehicle density estimations as well as in the study of the waiting phenomena at intersections and barriers.From the beginning of the 30s until the late 70s, special attention was devoted to theoretical and application aspects of these topics.Important studies have been made by Kinzer (1933), Adams (1936), Breiman (1962) and Gerlough and Barnes (1971).Helpful guidelines for traffic engineers about the study and application of counting distributions are also contained in Gerlough and Huber (1975) and in May (1990).After a few decades theoretical research has restarted and some works in the field have been produced by Jabari and Liu (2012), Clementi, Monti and Silvestri (2011) and Cao, Tai and Chan (2012) who have analysed some statistical models for counting distributions.
Since the 30s Poisson law has been proposed for theoretical distribution of arrivals, in that it is a peculiar flow model for discrete events under conditions of statistical regularity.Some interesting generalizations about this law have been subsequently made along with the progress of research on the issue of vehicular arrivals.Poisson law was introduced by Kinzer (1933) for the elaboration of certain aerial surveys on traffic, previously carried out by Johnson (1928).Afterwards, it was applied by Adams (1936) for further numerical exemplifications and by Greenshields, Shapiro and Ericksen (1947) in a work on intersections.A lot of applications of this traffic model have been later suggested by Gerlough, first in a book entirely on the topic (Gerlough, 1955) and more recently, in collaboration with others, in a paper that also deals with further theoretical distributions that may be used as arrival laws (Gerlough & Barnes, 1971).Finally, certain realistic models of vehicular flows leading to Poisson arrivals are described in Breiman's (1962Breiman's ( , 1963) ) and Weiss and Herman's (1962) researches.
In the study of traffic conditions for which Poisson model is not useful, various scholars have suggested and sometimes verified through experimentation different counting laws.Specifically, among the most interesting contributions to applications, Beckmann et al. have proposed a simple binomial distribution model (Beckmann, McGuire, & Winsten, 1956); Haight (1959) has introduced the generalized Poisson law and studied it with reference to real cases in collaboration with others (Haight, Whilser, & Mosher, 1961); Buckley (1965) and Drew (1965) have researched the possibility of applying the negative binomial distribution; and, once again, Buckley (1968) has further generalized the Poisson law.Theoretical works which propose counting models, barely or not at all verified through empirical validation, cannot be predominantly neglected.Oliver and Thibault (1962), Buckley (1965), Buckley (1967), andSerfling (1969) can be consulted on such a type of law.
Finally, Ha, Aron and Cohen (2012) can be referred to for most recent results; they have introduced innovative models for time headway and counting distributions, supported by empirical researches on French roads.

Probabilistic Models for Arrivals
Results from experiments have shown that according to the statistical hypothesis Poisson law is, at the roots of its deduction, correctly applicable as a model for arrivals, on one or multiple lanes, if in the observed interval: a) the phenomenon remains stationary, i.e. no external perturbation intervenes and affects flows; b) the gap between vehicles is such that they do not influence each other.
Under traffic conditions in which the vehicles are not farther apart than the distance at which they do not interact with each other, the circumstance b) involves that the lower the flow, the more consistent with empirical data is the model.Considering the results from this research, the authors suggest to apply the model to flow rates up to 400÷500 vph in ordinary road conditions (dry weather, daylight, pavement in good conditions).Thus, Poisson law cannot generally be used without stationarity and on high flow rates; if these two conditions are not met, other probability laws are to be adopted.The criterion for choosing alternative counting models, presented in the literature and exposed in Gerlough and Huber (1975), is described below.
If the mean x and variance s 2 of the sample turn out to be substantially equal, one can assume that the statistical distribution may be close to Poisson law.According to such a law, the mean  and variance  2 turn out to be equal and this value completely defines the model.For the sake of simplicity, this paper does not deal with other statistical laws for which  =  2 .If the mean of the sample turns out to be higher than the variance, the measure variability can be deduced as lower than that expected from purely poissonian arrivals of equal mean.Should that be the case, the empirical data can be checked to see how consistent they are with the positive binomial model or with the generalized Poisson model, according to which the mean is generally higher than the variance.Also the quantities  and  2 completely identify both distributions.In terms of traffic, the circumstance x > s 2 has been found on a frequent basis in flow conditions far from the free flow circulation, when flow rates usually present a very high volume.Finally, if the mean of the sample is lower than its variance-in other words, for a given mean value countings appear to be more dispersed than those derived from purely poissonian arrivals-the negative binomial distribution, also known as Pascal's law, should be generally determined, in that it presents, like the data, a lower expectation than the variance  2 .Also in this case, mean and variance completely identify the model.Cases of traffic countings where x > s 2 and in the presence of compliance with Pascal's law have been mainly observed in flow conditions developed under traffic-light regulation, although Buckley (1967) has used this distribution as a model for arrivals on roads with two or more lanes.It is worth remembering that the choice for the length of the subdivision intervals t of the observation period T is known to influence the identification of the model, as well explained by Gerlough and Barnes (1971).For example, this means that, in the same period of observation, a flow can be Poissonian if the data are recorded on subintervals t 1 , but on the other hand it can conform to other models if t 2 < t 1 are used.As for the estimates shown in this paper, the width of the interval was predetermined by the assumptions about the minimum extension of the periods of stationarity.Moreover, the duration T of the observation period, at a constant length of the subinterval of subdivision t, has been proved to affect the ratio x s 2 : the bigger the interval T, the bigger the ratio x s 2 (Miller, 1970).
In addition to the selection criterion indicated in Table 1, a further criterion for the preliminary choice of the model, as a function of appropriate statistical data, is shown.In some research areas (e.g.Biometrics) (Gore & Paranjpe, 2001), this criterion is more efficient than the simple comparison between the values of mean and variance: in order to have a greater amount of information than that gained only from the comparison of , where μ 3 is the theoretical third central moment and m 3 is the corresponding frequency moment.On the plane (I,O,L) the point (I,L) is placed on a different position for each different theoretical distribution (Figure 1), lying on the segment AC if it is representative of a binomial distribution, on the half-line with origin in C if it is representative of a negative binomial distribution, and coinciding with the point C if it is representative of the Poisson law (Ord, 1972).
The positioning of the sample points   L , I on the plane (I,O,L) near the Loci previously defined can be helpful for the choice of the theoretical distribution that best fits the sample data; however, the statistical hypothesis can be verified by applying a test of hypothesis to the chosen model.This method is very useful and easy to implement, but on the other hand it is not always univocally discriminative, as later shown in this work.In Figure 1 the curve L = f(I) of the Neyman type A biparametric distribution is also illustrated.The probability distribution of the Neyman type A is: where m 1 and m 2 are: Relations L = f(I) for the models just mentioned and deduced from the expressions of ,  2 and  3 , are shown in Table 3.
Table 3. Relations L = f(I) for the probability models The negative binomial and Neyman type A biparametric distributions belong to the larger family of probabilistic models known as "aggregate or contagious distributions".They are of great importance in Biometrics when the elements of the surveyed populations are generally found in groups or clusters.As for traffic, the equivalent situation can be platoons.Therefore, in such a circumstance (running in platoons) traffic countings should be consistent with the negative biomial or Neyman type A biparametric distributions.This side of the problem will be dealt with in the following analysis of the empirical results.
The next section firstly shows a statistical procedure for the identification of steady-state periods; in fact, model identification, calibration and validation of counting laws have to be made on these stationarity periods.These periods need to be defined so as to proceed with the search for links between the statistical parameters identifying models and flow rates, or more in general, relations between flow rates and traffic processes parameters.

Flow Stationarity Tests
T is defined as the observation period of the traffic flow in a section of a lane.If T is divided into n smaller intervals t i , i = 1, 2, …, n of equal length t and the mean of the process, represented by the sequence {X i } of the random variables defined as "number of vehicles crossing the road section during the interval t i ", is constant, the flow is defined as stationary during the period T. In order to check that the realization {x i } = x 1 , x 2 , …, x n of the sequence {X i } is extracted by a process of traffic countings with a constant mean, a "distribution free" test is used in this paper in that, unlike other types (for instance, the sequential probability ratio test by Wald (1947)), it has the advantage of not requiring any statistical hypothesis on the arrival law.Through this test the constancy of the process mean is controlled by verifying the independence hypothesis of the sequence {X i } compared to the independence of the first "n" natural numbers (Kendall & Stuart, 1967).
The parameter used in this test is the correlation coefficient r between the two sequences: where x and s 2 are the mean and the variance of {X i } elements.The distribution of equation ( 5) is obtained by considering that the n values of r calculated in n permutations of {X i } elements are equiprobable.The extreme values of "r" are evidently -1 and +1, where r = +1 if the sequence of traffic countings increases linearly with the natural numbers, r = -1 if the {X i } decrease when the natural numbers increase.Choosing suitably large n, typically n > 8, it is possible to limit the length of the acceptance zone of the test and to replace (5) with the parameter obtained from (5) that follows: distributed according to a Student's t with (n -2) degrees of freedom.
Through these parameters, the hypothesis of flow stationarity at level  is refused when One has to choose n so that the length t is greater than a minimum value; in this way it is possible to consider the elements {X i } as independent elements and to apply to them a "distribution free" test, such as the one just mentioned.
Finally, the same length t of the observation interval must be kept sufficiently small to avoid non-linear flow variations during the interval.Such fluctuations cannot be pointed out by the test adopted which is rather very useful only for the linear trend hypothesis.Consequently, the present study focuses the quantities t = 20 sec and n = 30; so the observation period is equal to T = 10 min.With the foregoing assumptions, for definition the flow is considered stationary for intervals of less than 10 min.
In order to verify that the length t = 20 sec is appropriate to consider the observed traffic countings as independent elements, a lag 1 serial correlation test is carried out.This verification is usually used by applying, for the elements of {X i }, the autocorrelation coefficient given by: where ,  2 and  3 are the mean, variance and third central moment of the time headway distribution (Cox & Lewis, 1966).Using Equation (8) it is possible to calculate  and to evaluate if it is small enough to accept the hypothesis of absence of statistical relationships between the elements of the traffic counting sequence.
Indeed, in this research work no information about time headways is available; therefore, the lag 1 serial correlation test just above recalled is used through the quantity Equation ( 9) for sufficiently large n (n > 15) is normally distributed with mean and variance given by: where: The knowledge of the theoretical distribution of r allows performing a common two-sided test at the chosen significance level .Through this test, the hypothesis of independence is rejected if: Returning now to the correlation ratio test for the verification of the flow stationarity, it has been applied in two ways to the data surveyed and with the values of n, t and T previously specified.
The former way made the assumption of a test dimension equal to  = 0.05 and treated the countings of the first thirty t i starting from the observation instant.If the stationarity occurred, three additional consecutive subintervals were added and the first three subintervals were excluded; such a test was then repeated on the new sequence.The iteration stopped when the non stationarity of {X i } was obtained.The test was then performed on arrivals per 10-minute interval, where the first minute corresponded to the last minute of the previous interval.
The range of acceptance of this test, applied to the empirical data, was not small, even if the sample size n was not very large.The procedure for data sets creation was therefore modified in order to obtain realizations of gradually size increasing {X i } as well as increasingly narrow acceptance intervals: starting from the first instant, the countings relating to the following t i were added one by one and the test was carried out, step by step, on larger and larger samples; the process is stopped when the non-stationary flow occurred.The procedure was iterated starting from the traffic counting value which corresponded to the last minute of the previous sequence and identifying further periods at a constant flow, and so forth up to the complete analysis of the available data.
The test presented a considerable inertia and revealed the presence of a trend only in very few cases.
For those reasons, the stationarity intervals identified through the first technique were then employed in the following analyses.
The methodology used here has been described by Esposito and Mauro (1994) who applied the technique for the identification of stationarity periods to empirical measures of traffic countings.After nearly two decades the authors used the same technique on newly-acquired empirical data, as specified below; the results from newly-collected data and their comparison with those obtained by Esposito and Mauro (1994) are presented in the following paragraph.

Empirical Data Analysis
The methods described in paragraphs 2 and 3 were used in traffic countings data carried out in the spring of 2012; these data refer to two sections of two-lane rural highways (one lane for each direction).In particular, the data of 1800 intervals with a width t = 20 sec, equal to a total of 10 hours and 3053 vehicular passages, were analysed.During these periods, the observed road section was filmed by a video camera and the countings were later carried out by means of the recordings.
Empirical data have been surveyed on two road sections of the following roads in the province of Trento, Italy: 1) Provincial Road No. 36 "delle Grazie", in the municipality of Arco; 2) National Road No. 421 "dei Laghi di Molveno e Tenno", in the municipality of Tenno.
The first section is set on a long, flat and straight road where overtaking is disallowed.The roadway is about 7.00 m wide, with lane width of 3.25 m and paved shoulders of 0.25 m.Table 4 shows some information gained from the countings on section 1.Instead, section 2 is set on a short, straight road, with a 6% grade, where overtaking is disallowed.The carriageway is about 6.50 m wide and the paved shoulders are about 0.50 m.Table 5 summarizes some information on the survey.By applying the methodology above described to all the collected data, the observed intervals were divided into 23 stationarity periods, summarized in Table 6 and in Table 7, respectively for the former and the latter road section.For each stationarity period such tables contain the mean (for vehicles per 20 sec and vehicles per hour), the variance and the parameters I and L. By adopting the criteria laid down in paragraph 2, the theoretical distributions that best approximate the observed data for each period with a constant flow rate were analysed; the values of the parameters obtained by the calculations are also indicated in the tables.
No empirical series is well approximated by a Poisson distribution; the observed data, instead, seem to be approximated by the negative binomial (in most cases), binomial or Neyman type A distributions.The graphical representation on the plane (I,O,L) of the sample points   L , I has helped to identify the most suitable theoretical model; in that regard see Figure 2 and Figure 3.These models were calibrated through the criteria indicated in paragraph 2 and their parameters are shown in Table 6 and Table 7.The validation of the chosen models to the theoretical predictions was then verified by performing a  2 test and by calculating the relative squared index, which confirmed the above choices.More specifically, the relative squared index I rs is defined as follows: where y i is the observed value i; i y ˆ is the value i of the theoretical distribution.By way of example, Figure 4, Figure 5 and Figure 6 show the frequency distributions of three sets of empirical traffic countings, together with the corresponding theoretical distributions that best approximate the data; the figures also show the value of the probability p  obtained by applying a  2 test to assess the conformity between the sample statistic and the law chosen as its theoretical representation (note that values p  tending to 1 support the hypothesis verification) and the relative squared index I rs .An example for each analysed model (negative binomial, binomial and Neyman type A) is shown.
As for the binomial and negative binomial frequency distributions, in order to avoid approximations, the binomial coefficient was calculated by generalizing the factorial operator and using the Gamma function (x) through the following definition of general validity: As mentioned above, a similar analysis had been carried out by Esposito and Mauro (1994); in that paper traffic countings were carried out in 2892 intervals with a width t = 20 sec, for a total of 4745 vehicular passages.These data were collected on three two-lane rural highways between 1989 and 1992; each road had different geometric alignment features, as well as different overtaking rules.Esposito and Mauro (1994) identified 33 steady-state periods; their data (minimum and maximum values) are summarised in Table 8.The regression curve equations, obtained by Esposito and Mauro (1994), are as follows: ) These curves are shown in Figure 9 and Figure 10, together with the regression curves of the parameters p and k obtained by jointly examining data in the paper by Esposito and Mauro (1994) and data coming from the above said surveys carried out in the spring of 2012; the following equations were obtained: obtained by traffic countings made in 2012 are the half than those resulting from traffic countings carried out by Esposito and Mauro (1994).
Although carried out at a distance of about two decades, the results obtained from the two traffic counting surveys are essentially consistent with one another; they can be used in simulations which do not allow performing a complete empirical analysis and in parameter estimation for arrival statistical distributions, on condition that there is compliance with the negative binomial theoretical distribution.

Conclusions
The paper deals with the identification, calibration and validation of statistical models for countings of two-lane rural highways, especially with the definition and determination of the statistical stationarity periods of the flow rate, on the basis of the observed data analysis.
After a brief review of probabilistic arrival models usually used in the study of traffic phenomena, a complete statistical analysis methodology has been developed to define stationarity periods, to verify the independence of the events (the sequence of arrivals) as well as to identify the probability laws for traffic countings.The procedure was applied to some samples of traffic counting, empirically surveyed on two two-lane rural highways; then, links were established between flow rate and parameters of the negative binomial distribution (found to comply with most of the data).However, the same analysis was not performed in the binomial, Poisson and Neyman type A distributions because of the very limited availability of data.Finally, a comparison was made between the results from the 2012 data collection and those reported in the previous paper by Esposito and Mauro (1994); the latter applied the same procedure to other data sets.
The analysis of the data collected specifically for this study confirms what already mentioned by Esposito and Mauro (1994); indeed, it shows that the arrivals analysed in the two roads are mostly well modeled by the negative binomial counting distribution (which is an "aggregate" or "contagious" distribution), and other data (though insufficient for in-depth analysis) is well modeled by the Neyman type A distribution, which is defined as an "aggregate" or "contagious" distribution as well.These results are consistent with the fact that the flow mainly recorded on the two examined infrastructures is a commonly called "platoon" flow.So, in general and even for low flow values, the Poisson law seems to be unsuitable for representing the arrivals on two-lane rural highways; instead, other models are recommended in these cases.Such models have to take into account the vehicle group formation influenced by a leader.
Finally, as regards the relationships between the parameters of the negative binomial model and the flow rate, the comparison between the relations from this study and those by Esposito and Mauro (1994) has shown that the curves are substantially corresponding each other for flow rate values which are located approximately in the central part of the examined range (300 < Q < 500); instead, the curves move away in correspondence with the extreme values, but not in a pronounced way.
The results obtained from the two traffic measurement surveys, although carried out at a distance of about two decades, are essentially in good agreement, and they may be useful comparison tools for parameter estimates of arrival distributions, on condition that there is compliance with the negative binomial distribution.Such results can be useful in simulations when a complete empirical analysis cannot be performed.However, the limited size of the sample data analysed does not allow to extend the relationships found to other two-lane rural highways without taking proper precautions or performing adequate sensitivity analyses.

Figure 1 .
Figure 1.Probability distributions on the plane (I,O,L)

Figure 4 .Figure 5 .Figure 6 .Figure 7 .
Figure 4. Frequency distributions for set A1 By comparing the relations at different times, the curves appear to be substantially corresponding to flow rate values which are located approximately in the central part of the examined range (300 < Q < 500); instead, the curves move away in correspondence with the extreme values, but not in a pronounced way.The curves obtained from the complete data analysis (data collected during the years 1989-1992 and 2012) are very close to those found in the paper byEsposito and Mauro (1994) (data collected during the years 1989-1992); this can be easily explained by considering that the sample points   L , I

Figure 9 .
Figure 9. Regression curve between the negative binomial parameter p and the flow rate

Table 1
summarizes the criterion for choosing a model according to the mean x and variance s 2 values.

Table 2 .
Theoretical distributions of arrival: probability distribution, mean, variance, parameters estimate

Table 4 .
Information about the countings on the Provincial Road No. 36 "delle Grazie"

Table 5 .
Information about the countings on the National Road No. 421 "dei Laghi di Molveno e Tenno"