On the Potential Application of Uniformity Tests in Circular Statistics to Chemical Processes

Three methods of circular statistics: Rao’s spacing test, Kuiper’s V-test, and Rayleigh’s test based on the mean resultant length are applied to examine the hypothesis of uniform data distribution (i) in a hypothetical study of decaying effectiveness of electrocatalysis, and (ii) in a tank flow reactor with excessive local temperatures (“hot spots”). Numerical illustrations involve test results indicating rejection, or failure of rejection, for the null hypothesis of population uniformity. The results confirm the power of Rao’s method demonstrated in pertinent literature.


Introduction
Circular statistics, employed over several decades in certain natural and life sciences, notably in biology, deals with the analysis of experimental observations of circular or spherical phenomena occurring along radial or axial directions.A fundamental step in its inception was a frequency distribution/histograms based analysis of certain orientations of avian motion (Matthews, 1961;Schmidt-Koenig, 1964;Nievergelt, 1966).As in linear statistics, the main objective is to draw appropriate inferences about population parameters on the basis of samples.Observations are either of ab ovo geometric nature, or of a temporal nature, where time-related distributions can be "fitted" into a circular or spherical pattern.
Three widely known methods for carrying out such a task are the subject of the current paper.Biological applications related to the technique described in Sections 3.1 and 5.1: Puri, Rao, and Yoon (1976), Frei and Wagner (1976), Wagner (1972); in Sections 5.2 and 6.2: Barnes (1975), Merkel and Witschko (1965), Perdeck (1963); in Sections 3. 3 and 5.3: Hölldobler (1971), Walcott (1974), W. Wiltschko andR. Wiltschko (1975a, 1975b) illustrate the earlier predominant role of one particular life science in predicting the potential utility of circular statistics in other domains of scientific and technical endeavor.
The seminal works of Mardia (1972a), Batschelet (1981a) and Fisher (1993a), especially noteworthy for didactic/heuristic purposes, serve as major sources for the subject matter of this paper, whose motivation stems from the apparent lack of tangible awareness (at least to the knowledge of the author) of circular statistics in the chemical-sciences literature.
The specific aim here is to demonstrate the potential usefulness of circular statistics to chemical process analysis, in deciding whether experimental data of limited size obtained in a sample would justify the inference of uniformity of the population.Following a short presentation of theoretical aspects, numerical examples illustrate calculations required for drawing proper conclusions about process performance.Due to the stated difficulty to obtain real-life numbers, the posited hypothetical data are used exclusively to demonstrate practical applications with computational procedures, and to show the format of required data presentation.

Hypothesis Testing: General Notions
Common to each testing method is the statement of a null hypothesis H 0 : in Sections 1-3, the null hypothesis states that the sample is drawn from a uniform population (universe); i.e. the data in the population are uniformly distributed.In Section 4 the null hypothesis states the lack of significant difference between two non-uniform populations, each parenting the available sample at hand.Each method rejects H 0 if its test statistic exceeds a critical value depending usually on sample size n, and the level of significance α.In the traditional theory of statistics α = 0.05 is significant and α = 0.01 is highly significant, meaning that the error, called usually the Type 1 error, made in rejecting H 0 is 5%, or 1%, respectively.The modern approach is more flexible by allowing the tester to decide rejection of H 0 on the merit of the P-value of the test, drawing upon personal knowledge and experience.The P-value is the magnitude of the error committed in rejecting H 0 in face of the computed test statistic.
The three tests considered here carry the names of (i) Rao (Rao, 1976;Russell & Levitin, 1995a;Batschelet, 1981b); (ii) Kuiper (Kuiper, 1960;Mardia, 1972b;Batschelet, 1981c;Fisher, 1993b), and (iii) Rayleigh (Mardia, 1972c;Fisher, 1993b).As seen in the sequel, Rao's method is more prone to reject H 0 than the Kuiper and the Rayleigh test in face of a sample, unless data distribution is appreciably uniform at least in some of its sub-domains (Russell & Levitin, 1995a).Put otherwise, Rao's test carries a smaller Type 1 error than the other tests; however, it follows from the nature of statistical data (Porkess, 2005) that rejection of an H 0 cannot be absolutely certain.
Rejection of H 0 carries acceptance of the alternative hypothesis (counter hypothesis) H 1 : the population distribution is non-uniform.It implies that deviations from uniformity are too large to ascribe them simply to chance factors, hence they are of deterministic origin.Since rejection of a null hypothesis is statistically stronger than its opposite, Rao's method is more inviting when at least medium-size deviations from non-uniformity are expected from prior inspection of the available data.

Rao's Spacing Test
The 91100 Internet entries in July 2012 attest to the widespread applicability of this method with its test-statistic defined as: where, expressed in degrees, T i = θ i+1 -θ i ; T n = 360 + θ 1 -θ n ; T 1 + T 2 +…+ T n = 360, given the θ i ; i = 1,2,…, n angular observations in ascending order.The "+" superscript indicates that the summation in Equation (1) extends only over positive values of (T i -λ).Summing over only negative values of (T i -λ ) yields the same magnitude but with a negative sign, hence U in Equation (1) could also be written as one half of the sum taken over all values of (T i -λ).
The consistent use of degrees is desirable inasmuch as critical values of U α are tabulated in degree units (e.g.Batschelet, 1981b;Russell & Levitin, 1995b) as a function of sample size n and the value of the test-statistic U obtained via Equation ( 1).H 0 is rejected at significance level α if U > U α .
In general, if observations are obtained in equal-size circle segments (a frequent case in practice), the strength of Rao's test depends on the number of segments, inasmuch as the computed U -statistic will exceed its critical value, at a set level of significance, only past a certain "threshold" value of n.For instance, a not significant U = 160.00 at n = 24 would be significant at n = 23 since U 0.05 = 159.48 at n =24, and U 0.05 = 160.01 at n = 23 (Russel & Levitin, 1995b).

Kuiper's Test
This section follows Fisher (1993b) closely, who recommends the prior construction of an [i/(n+1) vs. θ i ] quantile plot.If the line drawn through the (0, 0) origin at a 45° angle passes roughly through the observations, H 0 may be taken to be correct.The formal test in the case of ungrouped data consists of computing the modified Kuiper statistic where the Kuiper statistic V n = D n + + D n -is obtained from two sets of maxima: with x i = θ i /360, and θ i is measured in degrees.If the numerical value of V exceeds the chosen critical value V α (Arsham, 1988;Stephens, 1974;Fisher, 1993c), e.g.V 0.15 = 1.537;V 0.10 = 1.620;V 0.05 = 1.747;V 0.01 = 2.001, H 0 is rejected.Slightly different versions (Mardia, 1972b;Batschelet, 1981c) yield the same results.
In the case of grouped data with group-wise observation sizes n 1 , n 2 ,…, n k ; (n 1 + n 2 +…+ n k ) = N, and chosen recording angle ψ 0 , the parameter m = n/k; k = 360/ψ 0 (the latter measured in degrees) should be at least 2 in each interval for H 0 to stand.The number of intervals is to be reduced until this condition is satisfied.Upon satisfactory grouping H 0 is rejected if the numerical value of the statistic exceeds the critical value of the (conventional) chi-square statistic χ 2 α (k-1).The tabulation of P-values as a function of test-statistic values and degree of freedom (k-1) (Batschelet, 1981d) is particularly useful for decision making that bypasses the conventional α = 0.05 and α = 0.01 levels of significance.

Testing for Significant Difference between Several Sets of Observations
If several separate sets of grouped random observations for a physical phenomenon or process have been obtained, they may or may not differ significantly from one another in a statistical sense.In the simplest case of two separate sets, the group frequencies are arranged into a conventional (2xk) contingency table (Batschelet, 1981d) where the first row is made up of the frequencies of the first group: (n 11 , n 12 , n 13 , …, n 1k ) with total frequency M 1 , and the second row of the frequencies of the second group: (n 21 , n 22 , n 23 , …, n 2k ) with total frequency M 2 .Denoting exceeds the critical value of χ 2 α (k-1), the null hypothesis H 0 : no significant difference exists between the two observation sets (or, equivalently, the two sets have been drawn from the same population) is rejected at the α level of significance.If an expected frequency is less than five, coalescence with an adjacent group (cell) is necessary, reducing the degree of freedom that could result in increasing the chances of rejecting H 0 .

Application to an Electrochemical Process: Monitoring the Decay of Electrocatalytic Activity
In a recent study of catalytic electrodes for the direct oxidation of methanol in fuel cells (Seo, 2012), the poisoning effect of carbon dioxide was monitored by measuring the decrease in cyclo-voltammetric (CV) current peaks as a function of electrode composition and time.In a hypothetical modification, an on-line setup employing twenty parallel CV cells operated under identical conditions is envisaged, where time instants at which the peak current drops to 90% of the first (highest) forward peak current are logged and arranged into circular subdivisions of 360 degrees.In ascending order, the a-priori randomly observed angles θ i (i = 1,20) representing the 10%-drop time instants are assumed to be 45, 47, 51, 100, 110, 115, 118, 160, 160, 161, 225, 230, 232, 267, 280, 285, 287, 290, 310, and 355 degrees.These measurements are taken to be a sample from a population of 10%-drop time instants converted into angular positions.

Kuiper's Test
The observations are not grouped, thus from Equations ( 3) and ( 4 2) is less than the critical value of 1.537 (α = 0.15), carrying a 15% error if H 0 is rejected.

Application to Nitrous Oxide Production: Monitoring Excessive Temperature Rise in a Tank-Flow Reactor
An explosion in a tank-flow reactor producing nitrous oxide from an aqueous ammonium nitrate feed solution containing 83 mass% NH 4 NO 3 and 17 mass% H 2 O by direct decomposition (Fogler, 1992) was attributed to the cutoff of the feed due to pressure fluctuations most likely due to excessive N 2 O buildup in the reactor, approximately four minutes prior to explosion.An upright cylindrical shape tank-flow reactor is envisaged to operate in a pilot plant with the nitrate feed entering at its bottom and the product stream exiting at its top.It is assumed that, in order to prevent such an explosion, the temperature is continuously monitored in each of the ten 36 degree wide vertical segments at midpoint angular locations to alert operating personnel of the existence of potentially dangerous overheated locations (hot spots).
Hypothetical alarm frequencies observed in this manner over a fixed testing period are shown in Table 1.The null hypothesis is stated as H 0 : the circular distribution of hot spots in the segments is uniform (in a sufficiently large number of reactors considered as a population, under identical experimental conditions).The total number of alarms being 104, the numerical value of λ = 360/104 is to be subtracted from each T i .At each boundary the (T i+1 -T i ) value is consistently 36 degrees, including the last entry of 360 -342 + 18.It follows that U = 10(36 -360/104) ≈ 325, hence the P-value is essentially zero, indicating that the hot spot distribution may be taken to be definitely not uniform.

Comparison of the Single-Sample Tests
The observations concerning the methanol fuel cell are only mildly uniform over four short regions (45-51; 110-118; 225-232; 280-290 degrees), hence they do not meet the locally strong uniformity stipulation (Russell & Levitin, 1995a) for reducing the strength of Rao's test.Kuiper's and Rayleigh's test exhibit large P-values, although the former is somewhat less realistic with respect to the powerful rejection of uniformity by Rao's test.
In the case of the nitrous oxide reactor Rao's and Rayleigh's test yield very small P-values implying a strong rejection of the hypothesis of uniformity, while Kuiper's test offers only a "borderline" rejection.
It is instructive to consider three particular (of the many possible) extreme situations hypothesized for the nitrous oxide plant: (1) twenty observation angles are uniformly 18 degrees apart; (2) ten observation angles are repeatedly 180, and the other ten repeatedly 360 degrees; (3) in each quadrant there are five repeated angular observations at 90, 180, 270, and 360 degree positions.Rao's test yielding U 1 = 0, U 2 = 324 and U 3 = 288 infers correctly extreme uniformity (Case 1) and extreme non-uniformity (Cases 2 and 3).With V 1 = 0.234, V 2 = 2.3401 and V 3 = 1.1702 computed via Equations (2-4), Kuiper's test rejects the null hypothesis of uniformity only in Case 2, while Rayleigh's test retains the null hypothesis of uniformity inasmuch as it yields R = 0, hence Z = 0 and P = 1 in each case.The strength of Rao's test demonstrated here is not generally attainable at all conceivable extremes.

Analysis of the Two-Sample Results
The last cell in Table 2 carrying e 220 whose value is slightly less than 5 would require, in a rigorous treatment, the coalescence of the nineteenth and the twentieth cell producing new entries n 219 = 20 and e 219 = 15.125; the difference between the rigorous value of χ 2 = 36.89and χ 2 = 38.40via Equation ( 11) is of no practical importance since both indicate a strong rejection of H 0 .
When the number of observations is small, visually inferred non-uniformity may not receive strong statistical confirmation.If, for instance, in the nitrous oxide plant four sets of alarm frequencies: (23; 32; 24; 18) and (44; 27; 18; 11) were available in two consecutive data sets, the chi-square value of χ 2 = 9.507 obtained via Equation ( 11) would indicate a P-value of about 0.024, i.e. rejection of uniformity at a significant, but not at a highly significant level according to conventional statistics.If three consecutive sets with alarm frequencies: (23; 32; 24; 18), (42; 27; 18; 9) and (33; 29; 16; 21) were available , the test statistic χ 2 = 12.64 with (3-1)(4-1) = 6 degrees of freedom would indicate non-uniformity for all practical purposes at a 5% level of significance due to the P-value = 0.0491.
If the uniformity of the data population has been rejected, the process analyst might wish to test the admissibility of a proposed (supposedly by the analyst) probability distribution in a similar manner, with expected frequencies computed from distribution functions pertaining to the probability model under consideration.This topic, beyond the scope of the current paper, is well illustrated in the case of the widely used Von Mises distribution (Mardia, 1972d) and its application to chemical systems (Fahidy, 2012).

Final Remarks
The fundamental goal of the material presented here is to provide the process analyst/designer with useful means to assess chemical process performance using the tools of circular statistics, carrying in mind the maxim: "… the purpose of computing is insight, not numbers…" (Hamming, 1973).Much remains to be explored in this area by chemical scientists and engineers with interest in statistical techniques.

Table 1 .
Alarm frequencies observed during a test period in the hypothetical nitrous oxide plant of Section 6 Segment index i Mid-point angle within the i-th segment, degree Number of warning signals

Table 2 .
Contingency table for two sets of observed and expected alarm frequencies in the hypothetical nitrous oxide plant of Section 6. M 1 = 104; M 2 = 88; M 3 = 192