On the Classification of Colored Textures From a Texture-Ranking Experiment: Observers Ability of Discrimination Quantification

Amadou Sawadogo1, Dominique Lafon2, Simplice Dossou-Gbété3 1 Department of Mathematics and Computer, University of Félix Houphouet Boigny, Abidjan, Ivory Coast 2 Ecole des Mines d’Alès, Site Hélioparc, Pau, France 3 Université de Pau et des Pays de l’Adour, E2S UPPA, CNRS, LMAP, PAU, France. International Chair in Mathematical Physics and Applications (ICMPA-UNESCO Chair), University of Abomey-Calavi, Benin


Introduction
In contrast with the past, sensory properties of materials are subject to growing attention both in an hedonic and utilitarian point of view. In industrial sectors such as luxury, cosmetics or transports, the visual aspect of objects is a criterion of evaluation and decision in customer's choices. Effect colors (metallic and pearlescent) are now commonly used. Depending on the angle of view, they produce different visual feelings.
The classical color descriptors (Fairchild, 2013;Hunt & Pointer, 2011), e.g., CIE Lab, CIE Luv, CIECAM02 etc, do not provide a complete representation of these effect colors and more generally on surfaces exhibiting a colored texture. It is now clearly established that our visual system is concerned not only with individual colors but also with contrasts. However, it remains unclear how to characterize visual similarities between textures belonging to the same type.
The purpose of this study was to construct Fechnerian scales on a physical dimension of stimuli capable of describing the behavior of the average observer with respect to the perceived contrast of colored textures types. The similarity relations between isochromatic colored textures poorly different regarding to their luminance level were investigated. To this end, the Michelson contrast is considered to be the physical continuum in the physical domain of the colored textures. The colored textures consisted of four types of texture (Random-dots textures, Isotropic textures, Horizontal gratings and Vertical grating) and three color ranges (red, green and yellow). Then, we conducted a softcopy experiment consisting in ranking the colored textures according to the attribute of visual contrast. Ties were permitted in the ranking experiment. The data thus collected are rank data with ties.
In the literature, many models of rank data have been proposed over the years. They could be categorized into four classes (Marden, 1995;Fligner & Verducci, 1986;Fligner & Verducci, 1988): (i) paired-comparison models, (ii) distance basedmodels, (iii) multistage models and (iv) order-statistics models. Applications of the models to real life situations can be found in the literature; for example, in (Marden, 1995;Critchlow & Fligner, 1991). In the statistical modeling of ranking data, the paired comparison approach is widely used and relies upon a clear formulation of probabilities distribution models. The basic idea behind this approach, in particular the model considered for the analysis of the rank data is that, the observer reports their preferences only after having an unambiguous ranking, by starting with paired comparisons, with possibility of ties. In other words, the multiple pairwise judgments are consistent. Consistency means transitivity. Hence, each pair of stimuli will have a little triple of probability attached. Rao and Kupper (1967) and Davidson (1970) suggested particular parametrizations of the paired comparisons when extending the Bradley-Terry model to account for ties. The parametrization of the paired-comparisons assumed were Davidson's model (Davidson, 1970). The latter is defined as follows: choose either stimulus i or stimulus j respectively with probabilities θ i j = π i /(π i + π j + ν √ π i π j ), θ ji = π j /(π i + π j + ν √ π i π j ) and do not choose with probability θ i j2 = ν √ π i π j /(π i + π j + ν √ π i π j ) with ν ≥ 0, where π i denotes the "worth", an index of relative preference, of the ith stimulus and ν ≥ 0 an index of discrimination which is, in our context, interpreted as a measure of difficulty of the population of observers to discriminate between the set of stimuli. From this point of view, we made use of the extension (Sawadogo et al., 2017) of the Mallows-Bradley-Terry ranking model for one block comparison consisting of all the stimuli of interest to analyze the collected ranking data.
This paper is organized as follows. Section 2 describes the apparatus, the investigated stimuli, the raw data set and indicates how they had been collected. Section 3 recalls the generalization of the Mallows-Bradley-Terry ranking model. In section 4, a likelihood ratio test has been first performed to answer the question of discernability of the stimuli. Secondly, post-hoc analyzes based on post-hoc tests have been applied to find out which stimuli differ from the others. Thirdly, we have addressed the construction of Fechnerian scales on the physical dimension of stimuli using different psycho physical methods and indicate how such scales differ. A discussion has been carried out where our findings are compared with recent studies in Section 5.
The data used for this study were collected by a softcopy experiment.

Apparatus
Stimuli were presented on a color CRT monitor (Philips) with a refresh rate of 85 hertz and using the resolution 1600×1200 pixels. Prior to the experiments we had characterized and calibrated accurately the CRT monitor in order to control the colorimetric attributes of the displayed colors. The white point of the monitor has been set to the standard illuminant D 65 . A colorimetric calibration with a PR650-spectroradiometer (Photo Research Inc. USA) of the CRT monitor allowed to display colored textures corresponding to predetermined colors physically expressed in the standard CIE1931 (Commission internationale de l'éclairage). notation (X, Y, Z). The observers sat 60 centimeters away from the screen. The experimental environment has been strictly controlled. It was important to minimize spatial color interferences on the perception of the stimuli displayed on the monitor screen. The experiment were carried out under diffuse lighting conditions. The light source in the room was a courtesy light with a fluorescent tube (Just Normlicht Color Control Daylight 5000 kelvins) and a diffuser of light. The experiment took place in a blind room with the same neutral grey color for walls, ceiling and floor in order to avoid possible local variations of the ambient light. The monitor background was hidden by grey surfaces in order to reduce visual distractions (only the screen and the mouse were distinguishable from the background).

Stimuli and Contrast-Sorting Experiment
The experiment consisted in ranking isochromatic textures by a total of forty observers. The observers had a normal or corrected-to-normal visual acuity and a normal color vision. They were informed about the final purpose of the experiment. Prior to the start of the experiment, each observer had been trained in order to have him or her understand what we mean by visual contrast: we used series of colored textures of the same natural scene that exhibit different levels of visual contrast. After making sure that each observer understood the task, the experiment began. Each of the forty observers completed 12 trials. An isochromatic texture is characterized by its color range and the type of texture it belongs to. A color belonging to a given color range is described by its chromaticity coordinates and luminance (x, y, Y) in the well-known three-dimensional CIE XYZ color space (Bonton et al., 2000): (x, y) denotes its chromaticity coordinates and Y its luminance. A colored texture belonging to a given type of texture consists of two isochromatic colors i.e., described by the two tristimulus coordinates (x 0 , y 0 , Y 0 ) and (x 0 , y 0 , Y 0 + ∆Y): a luminance value Y 0 for each color range has been chosen as the reference for each color. Then we applied a luminance discrepancy ∆Y = Y − Y 0 . A Michelson contrast level M of a given colored texture is defined by M = |∆Y|/(Y 0 + 2 * |∆Y|). A sample of the textures types is presented in Figure 1. A trial consisted in displaying simultaneously on the CRT monitor screen, 20 colored textures belonging to the same type of texture pattern and the same color range; then ranking them according to the appreciation of the visual contrast perceived between the two components of each colored texture (from the lowest visual contrast corresponding to a very little difference between the luminances of the two components to the highest visual contrast). The observers were asked to use integer from 1 to 20; rank 1 is assigned to the lowest visual contrast texture. Ties were permitted and expressed by assigning the same rank to the stimuli exhibiting the same visual contrast according to the observers. The colored textures were randomly displayed on a uniform gray field for each trial. After an observer had completed a trial, he or she keyed his responses and finally clicked to ask for a new trial. The observers were free to use as much time as they found necessary for the set of 12 trials. For the 12 trials, the textures were presented to in a pseudo random order in order to avoid a familiarization: two successive trials present to the observers are necessarily composed of stimuli with different color ranges and different textures types. During the ranking experiment, it appeared that each observer took an average of 90 minutes for a complete round. The textures consisted of square uncompressed Bitmap images (200 × 200 pixels) with a color depth of 24 bits per pixel. They were displayed as a square of 5 centimeters × 5 centimeters without any resizing (5 degrees). Three color ranges (red, green and yellow) and four types of texture pattern (Random-dots texture, Isotropic texture, Horizontal grating and Vertical grating texture) had been investigated. The chromaticity coordinates and luminance of the considered color ranges in each color range are given in Table 1. The textures are identified according to their color ranges and the type of textures they belong to by two letters: the first letter denotes the color range and the latter the type of the textures. The Red, Green and Yellow color ranges are respectively identified by the following letters R, G and Y. The types of textures, namely Random-dot textures, Isotropic textures, Horizontal gratings and Vertical gratings are respectively identified by the letters R, I, H and V. Therefore, the indicator GR denotes the set of colored textures in the Green color range belonging to the Random-dots texture type; any colored texture of the set GR is identified by the prefix GR.
The collected data had been organized as follows: the data set consisted in all the elements of the set {R, G, Y}×{R, I, H, V}, each with 40 rows (the observers) and 20 columns (the colored textures).
Finally, each data is described by the observers identifiers, the color range, the type of texture, the luminance discrepancies ∆Y in colored textures, the Michelson contrast M levels associated to the luminance discrepancies and the ranks assigned by the observers to the colored textures.
The statistical model used for the analysis of the ranking data is described and defined in the section below.

The Extension of the Mallows-Bradley-Terry Ranking Model
As mentioned in Introduction, the paired comparison approach is the one considered. The basic idea behind this approach is that the observer reports their preferences only after having an unambiguous ranking, by starting with paired comparisons. We mean by unambiguous ranking, transitivity in the set of paired comparison.
Suppose that the paired comparison experiment involves q stimuli, where independent comparisons are made for the pair of stimuli {i, j}. It is assumed that ties are not permitted, and that the order of item presentation is unimportant. The basic parameters are then the q(q − 1)/2 quantities θ i j , where for i < j, θ i j denotes the probability that i is preferred to j in a paired comparison of these two stimuli. The Bradley-Terry model is given by θ i j = π i /(π i + π j ) where π i > 0 is the probability that the stimulus i is ranked lowest when the entire set of q stimuli are submitted for ranking.
In the most well studied case, there is simply one block consisting of all the stimuli of interest, and the resulting comparison is just a complete ranking of these stimuli, provided by each of n observers. To model the probability of such a ranking r of q stimuli, Mallows (1957) proposes a natural extension of the Bradley-Terry paired comparison model. According to the Mallows-Bradley-Terry ranking model, the probability of the rank vector r where c(π) stands for the normalizing constant. This result is well-known and called the Mallows-Bradley-Terry ranking model (Critchlow & Fligner, 1991). Sawadogo et al. (2017) adapted the Mallows-Bradley-Terry ranking model to ties by considering the Davidson (1970) extension of the Bradley-Terry model. To account for ties in a paired comparisons experiment, Davidson proposed an extension of the Bradley-Terry model as follows: choose either stimulus i or stimulus j respectively with probabilities θ i j = π i /(π i +π j +ν √ π i π j ), θ ji = π j /(π i +π j +ν √ π i π j ) and do not choose with probability θ i j2 = ν √ π i π j /(π i +π j +ν √ π i π j ) with ν ≥ 0, where π i denotes the "worth", an index of relative preference, of the ith stimulus and ν ≥ 0 an index of discrimination which is, in our context, interpreted as a measure of difficulty of the population of observers to discriminate between the set of stimuli. Notice that the Bradley-Terry model is obtained from the Davidson's when ν = 0.
Let us consider a set of q items labeled by distinct numbers from 1 to q. The q items are submitted to an observer for ranking with ties. A ranking is obtained by making independently all the pairwise comparisons until having an unambiguous ranking. The resulting ranks vector r is a finite sequence of integers of length q where each value belongs to the set of consecutive integers from 1 to q. Then the j−th component of r, say r( j), is the rank assigned to the item labeled j. In what follows, we assume the following definitions which will be used to define the extension of the Mallows-Bradley-Terry ranking model that account for ties.
Definition 2 Given a ranking r = {r( j)} j=1,2,··· ,q of the set A(q). The mid-rank of stimulus i, denoted r (i), is defined by The extension of the Mallows-Bradley-Terry model for ranking with ties is defined as follows.
Definition 3 The generalization of the Mallows-Bradley-Terry model for ranking with ties is defined by the following probability mass function, where A(q) denotes the sample space.
The parameter π j , j ∈ {1, 2, · · · , q} denote the "worth", an index of relative preference, of the j−th stimulus, π j > 0, q j=1 π j = 1. Whereas, the parameter ν is an index of discrimination and interpreted as a measure of difficulty of the complete population of the observers to discriminate between the stimuli being compared. One can remark that if r is a ranking without ties, the model given by Equation (2) is exactly the classical Mallows-Bradley-Terry ranking model defined above by Equation (1) because one will have r + = q and λ r (k) = 1, ∀k = 1, · · · , q leading to r (i) = r(i), ∀i = 1, · · · , q. The Fechnerian scale provides a continuous unidimensional dimensionless variable for preference. It expresses the degree of preference that an experimental observer shows.
Definition 4 The Mallows-Bradley-Terry model for ranking with ties is defined as follows denotes the normalizing constant.
Throughout the remainder of the paper, we will use the extension of the Mallows-Bradley-Terry ranking model in the form given by Equation (3).

Likelihood Ratio Testing: Discrimination Versus Non-discrimination
From the Davidson's model stated above, the probability of do not choose θ i j2 = ν √ π i π j /(π i + π j + ν √ π i π j ) with ν ≥ 0 tends to 1 when ν tends to infinity. The setting ν = 0 corresponds to a certain discrimination. The non-discrimination means that there is an equal probability of choosing either one or the other of both the two stimuli, whatever the concerned stimuli; which leads to ν > 0, ∀ j ∈ {1, 2, · · · , q}, π j = 1/q. Thus, the question of discrimination of the stimuli is answered by making an hypothesis testing: the hypothesis of discrimination of the textures versus the hypothesis of nondiscrimination of the textures. We were interested in testing the null hypothesis H 0 : π i = 1/q, ∀i ∈ {1, 2, · · · , q}, ν > 0 which corresponds to non-discrimination between the stimuli, versus the alternative hypothesis H 1 : ∃i; π i 1/q, ν > 0 which means discrimination between them. A likelihood ratio test based on the ratio where L n (θ, γ) = exp{l n (θ, γ)} denotes the likelihood of the model parameter vector (θ, γ), is obtained by using −2 log Λ n as the test statistic. Under H 0 , the statistic −2 log Λ n has an approximate χ 2 (q − 1) distribution with q − 1 degrees of freedom when n tends to infinity. The null hypothesis H 0 is rejected if −2 log λ n > χ 2 1−α (q − 1) where the quantities λ n and χ 2 1−α (q − 1) are the statistic Λ n value and the (1 − α) quantile of the chi-square distribution with q − 1 degrees of freedom respectively. Equivalently, H 0 is rejected if α > p − value.
The computation of the ratio c( θ, γ)/c(θ 0 , γ 0 ) of the normalizing constants involved in Equation (6) is required and involved sums with a huge number of terms. There is no closed form analytical expression for the calculation of this ratio.
To overcome this difficulty, we use the path sampling technique ; see (e.g.,Hunter and Handcock, 2012. This technique is a particular case of a Monte Carlo method called bridge sampling ( e.g., Meng and Wong, 1996). Hereafter, the following results describe how it works in our case.
Proposition 3 One has, is a sufficient statistic for the parameter vector ϕ(u) = (θ(u), γ(u)). The expectation E is taken with respect to the joint distribution of the random vector (U, r), where U is uniform [0, 1] and r | U = u is distributed according to P(r; ϕ) with parameter ϕ(U).

Rank Data Analysis and Fechnerian Scales Constructing
The data analysis by means of a fixed effects additive model showed that the visual contrast is a psychophysical scale corresponding to the physical scale determined by the Michelson contrast of the colored textures (Sawadogo et al., 2014). Furthermore, one has graphically noticed that the ability of the observers to discriminate between the colored textures according to the visual contrast varies according to the color ranges and the textures types. But, we have quantified neither the ability of discrimination of the set of the observers nor established a classification of the texture types and/or the color ranges. In the present paper, our objective is to provide some answers to these questions and make assumptions about the related Fechnerian sensory scales.
The model parameter vector (π, ν) has been estimated by means of Monte Carlo MM-algorithm method; see, Sawadogo et al.(2017). A bootstrap-t confidence interval has been implemented for the calculation of the confidence intervals CI 1−α (π, ν) of the model parameter vector (π, ν) at the significance level of α = 5%. For this purpose, we wrote some own codes using the R software (R Core Team, 2020). The results of the model parameter vector (π, ν) estimates are available in Appendix 2. In multiple regression, the multiple correlation coefficient R 2 is often used to measure the percentage of variation in the data that is explained by the regression model. An analogous quantity, in our context, is the percentage of nonuniformity R 2 in the data that is explained by the model, see, e.g., Marden, 1995. See Appendice 3 for the definition of the R 2 and its values in our data. The results show that the model fits well all the data. The estimates of the colored textures worth parameters π j , j ∈ {1, 2, · · · , q} increase when its corresponding Michelson contrast levels decrease whatever the signs of their luminance discrepancies ∆Y. Thus, an ordering relation based on the Michelson contrasts of the colored textures always corresponds to the ordering obtained during the psychophysical experiments whatever the type of colored texture and the color range. This suggests to consider the visual contrast as a psycho physical scale corresponding to the physical scale determined by the textures' Michelson contrasts M. This result is coherent with one found in (Sawadogo et al, 2014).
The likelihood ratio tests of H 0 versus H 1 result in the reject of H 0 at a significance level of α = 0.05 for all the observed data since the p−values (see, Appendice 4) are all less than α = 0.05. It comes that, all the observers were capable to discriminate between all the colored textures according to the visual contrast at a significance level of 5%.
From the likelihood ratio tests, significant differences in luminance contrasts are detected between the colored textures whatever the color range and the texture type. Given a color range and a texture type, it is necessary to find out which colored textures differ from the others (Dzhafarov & Perry, 2014). Therefore, the hypotheses to test are H 0 : m k = m j , k j vs H 1 : m k m j , where m k represents the medians of the stimuli k. A Conover post-hoc test (Conover, 1999) with p-values adjustment according to the Holm procedure (Holm, 1979) has been used. To this end, we make use of the PMCMRplus (Pohlert, 2019) package available in R software. The results of the multiple paired comparisons tests are available in Appendix 5: box-whisker plot for each of the colored textures belonging to a given color range and texture type is plotted. The range of the whiskers indicate the extremes. Letter symbols are depicted on top of each box. Different letters indicate significant differences between groups determined using p-values at the selected level of α = 5%.
The estimates of the model index of discrimination parameter ν differ from a colored texture to another whatever the type of the color range(see, Figure 5). This explains that the observers ability to discriminate between the colored textures according to the visual contrast varies according to the color ranges and the texture types. The probability of do not choose tends to 1 when ν tends to infinity; this probability tends to 0 when ν tends to 0: the smaller the values of ν, the better the ability of discrimination between the textures. The estimates of the index of discrimination ν are generally small in the green color range, indicating that discrimination is best in this color range. Moreover, one can classify either the textures with respect to the pattern they belong to for a given color range or the color ranges given the texture type basing on the index of discrimination ν. For instance, in the green color range, the Isotropic textures are the best discriminated with a value of ν = 0.6027, secondly the Random-dot textures ( ν = 0.7273), thirdly the Horizontal gratings ( ν = 1.2114) and finally Vertical gratings ( ν = 1.3243). This holds true for both of the red color range and the yellow color ones. Whatever the texture type, the sense of perception of the set of observers which determined their ability of discrimination is sharper in the green color range than the other color range. Then, follows the red color range and finally the yellow color range, except in the type of Isotropic textures. In this texture type, the yellow color range comes before the red one.
The construction of Fechnerian scales on physical dimensions of stimuli relies on the concept of threshold, namely absolute threshold and discrimination threshold. Fundamentally, there are two kinds of task that are used to obtain thresholds (Dzhafarov & Colonius, 1999;Pelli & Farell, 1995): classification and adjustment. In the formers, the experimenter controls the stimulus and the observer makes a judgment based on the resulting percept whereas in the latters the observer adjusts the stimulus to satisfy a perceptual criterion specified by the experimenter. Since the task of adjustment is subjective, classification tasks would be suitable for the determination of those thresholds. In the literature, three kinds of classification are widely used: yes/no, two-alternative forced choice (2afc), and identification. All three call for the observer to classify stimuli (or their subjective responses). Applying these three kinds of methods to determine the thresholds, the identification one would give best results; see, e.g., Pelli & Farell, 1995. The drawback of the 2afc is that because the observer has only two alternatives and thus will be right half the time even if the luminance contrast is invisible, a relatively large number of trials is required to obtain a good threshold estimate. This method would give good results after the identification task. The observer in a yes/no experiment can't avoid introducing an internal subjective criterion in deciding whether each faint ambiguous percept deserves a 'yes' or a 'no'. This task would be used as a last resort.

Discussion
The findings of this study clearly show that the ability of discrimination of the luminance contrast in the chromatic textures varies according to the color ranges and the textures patterns. The observers ability of discrimination is sharper in the green color range, followed by the red color range and finally the yellow color range, except in the type of Isotropic textures. In this type of texture, the visual perception is highest in the green color range, followed by yellow color range and finally the red one. Thus, it is sharper to detect luminance contrast in green color textures than the others ranges. This result supports the hypothesis that color information is processed independently from many luminance-based tasks (Livingston & Hubel, 1987).
Whatever the color range, the ability of discrimination is best in the type of Isotropic textures; followed successively by the Random-dots textures type and the Horizontal gratings type, and finally the Vertical gratings ones. It results that luminance contrast detection in chromatic images does not depend only on the luminance and chromatic attributes. This result is in agreement with the ones obtained by (Gegenfurtner & Kiper, 1990) who studied the contrast detection in luminance and chromatic noise. They show that it is incorrect to assume that psychophysical performance is always determined by the activity of just a luminance and a chromatic mechanism. Therein, they demonstrated the existence of additional mechanisms that determine performance in detection of contrast. They concluded that the possible involvement of these mechanisms must be considered when one interprets the results of other visual tasks. Our data support the hypothesis that the importance of color to visual performance can be revealed if chromatic and luminance variations are both present in visual targets (Gur & Akri, 1992). Moreover, Gur & Akri have shown that when color and luminance contrasts are combined, human contrast sensitivity is enhanced at all spatial frequencies tested and that there is a substantial facilitatory interaction between the chromatic and achromatic systems.
A large-sample test based on the likelihood ratio shows that the sense of perception of the set of observers were sharp enough to detect the existing luminance contrast difference. Furthermore, one has been interested in testing whether m i = m j accross pairs of stimuli (see, e.g., Dzhafarov & Perry, 2014) via post-hoc tests. Those tests allowed us to find out which colored textures differ from the others. The data analysis gives rise to a sensory scale that shows a monotonic functional relationship with the Michelson contrast on which the ranking experiment is based: the shapes of the curves in Figs. 2-4 of the Bradley-Terry-Luce parameters (π) versus the Michelson contrast (M) of the colored textures indicate higher preference for lower contrast. We recall that the observers have ranked the stimuli according to their appreciation of the perceived visual contrast, from the lowest contrast to the highest contrast. They were asked to use integer from 1 to 20; rank 1 is assigned to the lowest visual contrast texture. Therefore, in the context of our experimentation, preference parameter π i is referred to luminance contrast sensitivity for stimuli i. Thus, higher preference means lower luminance contrast sensitivity. This is what the shapes of the curves in Figs. 2-4 render. In short, the Fechnerian scales on the physical dimension of the colored textures depend on the chromaticness of the colored phases of textures and the texture types. The psycho physical method of identification would be the best when determining the related thresholds.
During the ranking experiment, it appeared that each observer took an average of 90 minutes for a complete round, which is a bit long, even with the remarkable ability of chromatic adaptation that has the human visual system. Indeed, fatigue impairs observer's perception. The color space used for the physical description of the colored textures is the CIE 1931 standard observer using 2 degrees and the formula of contrast used is the Michelson one.
International Journal of Statistics and Probability Vol. 10, No. 1;2021 by

Percentage of Nonuniformity in the Data Explained by the Model
We define, and Denom = 2 r N r log N r n/K with, n the number of observers, N r the observed frequency of the ranking with ties r and K = card(A(q)).
If the model exactly fits the data, R 2 = 1, and if it performs no better than the uniform, R 2 = 0.