Structured Laser Illumination Planar Imaging Based Classification of Ground Coffee Using Multivariate Chemometric Analysis

Most commercially available ground coffees are processed from Robusta or Arabica coffee beans. In this work, we report on the potential of Structured Laser Illumination Planar Imaging (SLIPI) technique for the classification of five types of Robusta and Arabica commercial ground coffee samples (Familial, Belier, Brazil, Colombia and Malaga). This classification is made, here, from the measurement of the extinction coefficient μe and of the optical depth OD by means of SLIPI. The proposed technique offers the advantage of eliminating the light intensity from photons which have been multiply scattered in the coffee solution, leading to an accurate and reliable measurement of μe. Data analysis uses the chemometric techniques of Principal Component Anaysis (PCA) for variable selection and Hierarchical Cluster Analysis (HCA) for classification. The chemometric model demonstrates the potential of this approach for practical assessment of coffee grades by correctly classifying the coffee samples according to their species.


Introduction
Coffee is a popular beverage used throughout the world (Oder, 2015).Though not widely consumed on the continent, African countries make a significant contribution to world coffee trade.Ethiopia was the third largest coffee producer in the world in 2015 with Cote d'Ivoire being number twelve (International Coffee organization, 2015).The quality of coffee depends on many factors, such as the growth environment and processing techniques (Carelli et al., 2006;Clifford & Willson, 1985).Arabica and Robusta varieties represent over 90% of the world production of coffee.Arabica coffee, originally from Ethiopia, is now widely cultivated in South America.Robusta, which is a variety of the canephora species, is grown in Africa (mainly in Cote d'Ivoire) and in the Far East (Vietnam in particular).Robusta coffee is richer in caffeine than arabica (from 2% to 3% against 1.3%), (National Coffee Association of USA, 2015).The arabica variety fetches higher prices on the world market.
In order to distinguish between Robusta and Arabica varieties on the market, non-specialists rely on information provided on the packaging.As it is the case with other high value agricultural products, there have been increased incidences of counterfeit coffee on sale in the market.Chemical and genetic procedures for identification of the origin of ground coffee exist but they are time consuming.Another method is sensory evaluation but it is not appropriate for accurate and repeatable classification.Development of measurement techniques and technologies that can objectively discriminate between Robusta and Arabica ground coffee are therefore highly desirable.Nowadays, the use of optical laser techniques for various quantitative measurements is commonly applied for a large number of applications.Here, we aim at measuring the extinction coefficient (e.g.Berrocal et al., 2012;Kristensson et al., 2011, Kristensson et al., 2012) in different coffee solutions and analyze the results by means of chemometrics.
Chemometrics is the use of mathematical and statistical methods to extract chemical information and to correlate quality parameters or physical properties from an experimental data-set.The basic process involves modeling of patterns in the data.The models are then routinely applied to future data in order to predict quality parameters.The chemometrics approach has been gaining interest in assessing product quality.The only requirements are the extraction of reliable measurements and adequate software to interpret the patterns in the data.
In this article, the extinction coefficient of various coffee solutions is measured using a recent approach called single-phase Structured Laser Illumination Planar Imaging (SLIPI) (Berrocal et al., 2012).While transmission measurement records the light intensity of a single beam crossing the sample of interest (where the initial and final intensities are recorder) SLIPI is based on imaging a spatially modulated light sheet from the side (at 90° angle).The main advantage of SLIPI over conventional transmission measurements is its efficient capability in rejecting the light intensity from multiply scattered photons, allowing more accurate measurement of the extinction coefficient in turbid media (such as coffee solutions).We aim, then, at combining SLIPI measurements with a data analysis based on chemometrics to make a reliable classification of various types of coffee.

Sample Preparation
The coffee samples were prepared following the same procedure.For each type of coffee, the solutions were prepared by weighting 4g of coffee using a Satorius VIC-303 balance with 1 mg resolution and dissolving it in 100 mL of boiled distilled water.We stirred the water and coffee mixture for 15 seconds to get a homogeneous suspension, filtered the suspension into a sealed glass flask and left it to cool to 20 o C before starting measurements.

Experimental Setup
The SLIPI technique was first created and applied in 2008 for imaging spray systems typically used in combustion engines (Berrocal et al., 2008).A description of it, in its various configurations, can be found in the doctoral thesis of E. Kristensson (Kristensson et al., 2012).We employ, here, the single-phase SLIPI approach for the measurements of the extinction coefficients and optical depths.The method has been presented and fully described in (Berrocal et al., 2012).Figure 1 shows a schematic of our experimental setup.
In the experiment, a coffee solution is illuminated in a cuvette with a spatially modulated laser sheet constructed using by a 5 lp/mm Ronchi grating and shaped by spherical and cylindrical lenses.The incident light is produced by three diode lasers emitting at 450 nm, 532 nm and 638 nm, respectively.A 650 nm high pass filter is positioned in front of the camera to only detect the laser induced fluorescence signal from the coffee.

Data Analysis
Our clustering algorithm uses the extinction coefficient and the optical depth of each sample for the different laser illumination.Therefore, each sample has six variables making this a multivariate statistical problem.In data sets containing many variables, groups of variables are often inter-related.This can be explained as one variable might be measuring the same underlying principle governing the behavior of the complete system.We can exploit this redundancy by replacing a group of variables with a single new variable.The best way to achieve this is to apply Principal Component Analysis (PCA).PCA generates a new set of variables called principal components.Each principal component is a linear combination of the original variables.All the principal components are orthogonal to each other, so there is no redundancy of information (Jolliffe, 2002;François Husson, 2014;Besse, 1992).
The first principal component accounts for the most variance and therefore has the most information; the second principal component has the second best variance, and so on.
With this information, one can reduce the original data to represent the significant contrast and trends with only a few variables rather than all contained in the original data by the removal of insignificant variables for the desired contrast.Adding more dimensions do not provide any additional contrast but only increases the noise and reduces the potential contrast of the outcome.
Hierarchical clustering and dendrogram representation (Hastie, Tibshirani, & Friedman, 2009) were applied to summarize the interdistance of the PCA scores to see if there were any discrete clusters of data points in the new coordinate system and how related these were.

Results
The coffee samples were identified as follows, based on the labels on their respective packages: Belier and Malaga are 100% Robusta while Brasil, Columbia and Familial are 100% Arabica.

SLIPI Results
The experimental results from SLIPI measurements are presented in Figure 3 (a,b,c,d,e) for the different ground coffee and extinction coefficient and optical density obtained with the different laser (450nm, 532nm, 638nm) grouped as shown in Tables 1, 2 and 3.The results in all the tables show a difference between all coffees with regard to the extinction coefficient and optical density parameters.The corresponding plots of the extinction coefficients are given in Figure 8(a) together with the ratio of the extinction coefficients (for each illumination scheme) in Figure 8(b).However, it is very difficult to classify in which type of coffee they are belonging (whether Arabica or Robusta).In order to find a relevant feature to describe the classification it is very important that the variables (extinction coefficient and optical density) are independent with regard to coffee species and laser illumination.To do this we used chemometrics which gives many different ways to solve the discrimination problem in the analysis of data.

Chemometric Results
Before performing the analysis, we checked the correlation between all the variables.The correlation among some variables is as high as 60 % (Figure 8).
Figure 8. correlation between the variables Note that there are a large correlation between extinction coefficient and optical density at the same wavelength.This high correlation can be explained by equation 1.
where µ e is the extinction coefficient.The optical depth (OD) is an approximation of the mean number of scattering events occurring through a scattering medium of length l.The extinction coefficient is equal to the sum of the scattering coefficient and the absorption coefficient (Equation 2): PCA was then applied to construct independent new variables which are linear combinations of the original variables.The variables do not have the same units so we apply PCA using the inverse variances of the data as weights.To determine which components have high variance and must be retained to describe the data, we made a scree plot of the percent variability explained by each principal component (Figure 9).The scree plot only shows the first two (instead of the total seven) components that explain 99.7% of the total variance.Thus only the first and the second principal component can be retained.
We then applied Hierarchical Clustering and Euclidean distance as a metric using these two new variables.Hierarchical Clustering groups data over a variety of scales by creating a cluster tree.We used the silhouette criteria to determine where to truncate the cluster.The silhouette value for each point is a measure of how similar that point is to points in its own cluster, when compared to points in other clusters.The silhouette value for the i th point (S i ) is defined as: where is the average distance from the i th point to the other points in the same cluster as i, and is the minimum average distance from the i th point to points in a different cluster, minimized over clusters.A high silhouette value indicate that it is well-matched to its own cluster, and poorly matched to neighboring clusters (Kaufman, L., & Rousseeuw, P. J., 2009).Figure 6 show the silhouette criterion values for the number of clusters tested.The plot shows that the highest silhouette value occurs at five clusters, suggesting that the optimal number of clusters is five.After this we grouped data over a variety of scales by creating a cluster tree using HCA (Figure 10).

Discussion
Chemometrics and SLIPI are both powerful techniques for spectroscopic studies; they have been used as complementary methods in this study.
The multivariate approach dealt with the following steps: pre-processing, PCA, variable selection and HCA classification.The data collected with SLIPI technique for each dataset (Belier, Malaga, Brasil, Colombia and Familial ground coffee) were used to show the suitability of this technique to detect similarity between the ground coffee samples.
For every pre-treated dataset, PCA was performed as an explanatory tool in order to get the overspread of data.PCA variance was used to retain the best components to use in describing the variability in the different coffee types and sample groupings.
The HCA plot shows a good grouping of the samples on the basis of the tow classes in the space defined by the two first components.This strategy shows that extinction coefficient and optical density measured with SLIPI technique could be useful in the discrimination of coffees species.

Conclusion
The strategy showed a clear coffee grouping on the basis of the tow classes (Arabica and Robusta).We can conclude that, the SLIPI technique combined with chemometric analysis of coffee samples offer complementary results for the discrimination of products and can be used to accurately classify and evaluate coffee samples.

Figure 1 .
Figure 1.Description of the single-phase SLIPI optical arrangement: A light sheet with a vertically modulated light intensity profile is formed, illuminating the coffee solution.Images of the spatially modulated light sheet are recorded from the side using an EM-CCD camera.By then extracting the amplitude of the modulation, the exponential light extinction through the cuvette can be observed.The measurements are performed sequentially for three different illumination wavelengths corresponding to 450nm, 532 nm and 638 nm respectively

Figure 2 .
Figure 2. Quantum efficiency curve of the Andor Luca-R EM-CCD camera (data curve from Andor Technology).The wavelengths of corresponding to each illumination scheme and to the low pass filter fixed on the camera objective are also indicated Experimental results of extinction coefficient and optical density from SLIPI measurements with Belier coffee at 450 nm (a), 532 nm (b) and 638 nm (c) Experimental results of extinction coefficient and optical density from SLIPI measurements with Malaga coffee at 450 nm (a), 532 nm (b) and 638 nm (c) Experimental results of extinction coefficient and optical density from SLIPI measurements with Familial coffee at 450 nm (a), 532 nm (b) and 638 (c) Experimental results of extinction coefficient and optical density from SLIPI measurements with Colombia coffee at 450 nm (a), 532 nm (b) and 638 nm (c) Experimental results of extinction coefficient and optical density from SLIPI measurements with Brasil coffee at 450 nm (a), 532 nm (b) and 638 nm (c)

Figure 8 .
Figure 8.(a) Results of the measurement of the extinction coefficient for each type of coffees for the three illumination wavelengths.(b) Ratio of the extinction coefficients for each type of coffee

Figure 9 .
Figure 9. Scree plot of the percent variability explained by the first and second principal component

Table 1 .
SLIPI results using laser illumination at 450 nm