Selecting Variables for Near Infrared Spectroscopy (NIRS) Evaluation of Mango Fruit Quality

Near infrared spectroscopy (NIRS) can be applied to assess the quality of mango. The purpose of this research is to select the appropriate chemical absorption bands to evaluate two cultivars of mango puree, cv. Keitt and cv. Nam Dok Mai Si Thong. Six main chemical substances found in mango fruit, such as glucose, sucrose, citric acid, malic acid, starch and cellulose, were evaluated in this study and there chemical absorption bands were identified. Mango puree was mixed with the six pure substances at various concentrations; glucose, sucrose, citric acid and malic acid were tested with concentrations of 0, 5, 10, 15 and 20%w/w, starch and cellulose were tested with the concentrations 0, 2.5, 5, 7.5 and 10% w/w. The NIRSystem 6500 was used to scan the spectra in the wavelength range from 400 nm to 1100 nm. The partial least square regression (PLSR) was used to develop a model for each component. The result was a wavelength that corresponds to each component. It was found that the second derivative spectra of glucose, sucrose, citric acid, malic acid, starch and cellulose mixtures showed the best PLSR result. The mango cultivar had no effect on wavelength selection by PLSR model. The coefficient of determination (R) of all models was 0.99. The standard error of calibration (SEC) and the standard error of prediction (SEP) were less than 0.5%w/w. The regression coefficient plot exhibited more sharp peaks than pure substances. The wavelength selection for NIRS evaluation mango fruit quality could not be done by using only measured spectrum of pure substance. However, the cultivars of mango had no effected on wavelength selection by PLSR model. The most effective wavelength for glucose and sucrose were 900-1000 nm, citric acid and malic acid were 800-1000 nm, starch was 900-1000 nm and cellulose was 800-1000 nm.


Introduction
Fruit quality can be evaluated using near infrared spectroscopy (NIRS).This technique can be used to measure chemical composition such as sucrose content of peach (Kawano et al., 1992), soluble solid content of Satsuma mandarin (Kawano et al., 1993), pineapple (Guthrie et al., 1998), apple (Lovász et al., 1994;McGlone et al., 2003), kiwifruit (McGlone & Kawano, 1998;Schaare & Fraser, 2000), mango (Saranwong et al., 2001), prune (Slaughter et al., 2003), Gannan navel orange (Liu et al., 2010), watermelon (Sun et al., 2010), banana (Subedi et al., 2011) and jujube (Wang et al., 2011), and acidity of mango (Schmilovitch et al., 2000) and Satsuma mandarin (Miyamoto 1998;Gomez et al., 2006).Moreover, it can be applied to detect the presence of damaged tissue such as brownheart of Braeburn apple (McGlone et al., 2005), scald, scab tissue and recent bruise of Jonagold apple (Kleynen et al., 2003), and browncore of Chinese pear (Han et al., 2006), and to estimate physiological variables such as maturity of Pawpaw papaya (Greensill & Newman, 1999) and Scarlet apple (Bertone et al., 2012).Mango is a tropical climate fruit with large export markets in Asia, Europe and North America (Litz, 2009).Fruit quality is frequently heterogeneous, partly as a result of the difficulty in harvesting fruit at the correct maturity stage.Non-destructive classification and selection of mango fruit by NIRS based on quality or maturity is therefore of considerable interest.Absorption in the infrared part of the electromagnetic spectrum is due to intermolecular vibrations (Williams & Norris, 2001;Wills et al., 2007).In the NIR region (700-2500 nm), it is due to overtones of the fundamental vibration frequencies of bonds and their combinations, particularly O-H, C-H and N-H bonds (Ozaki et al., 2007).So, the pure chemical absorption band should be indentified whereas their compounds are difficult to specify the exact wavelength.However, the multivariate statistical method is used to solve this problem.
Multiple linear regression (MLR) and partial least squares regression (PLSR) are the most frequently used chemometric methods for making NIRS calibrations (Burns & Ciurczak, 2001;Liu et al., 2010).In MLR some variables (wavelengths) were selected to obtain in the model.In contrast, PLSR uses all of the variables in the calibration model.Thus for MLR and PLSR, selecting a subset of variables can be important in optimising NIRS calibrations (Siesler et al., 2002).Most of research reported that it can be selected by using the information from the literature review, the wavelengths associated with particular bonds or compounds furthermore, the responsible model to changes in the studied compound was used.These can be considerated on the statistic result in term of standard error of calibration (SEC), standard error of prediction (SEP) and average different between predicted and actual values (bias) which are defined as follows: However, the assignment of bonds and compounds to absorptions at specific NIR wavelengths is not easy.Then the purpose of this work was to select the most suitable wavelength (variable) obtaining in the calibration model for mango fruit evaluation by NIRS.One way of determining which wavelengths correspond to a particular compound is to add it to a matrix and observe how the resulting spectrum differs from the original spectrum (Marten & Naes, 1989).This has been done with some food products and pharmaceuticals, but not with fruits.Six pure substances: sucrose, glucose, citric acid, malic acid, starch and cellulose, were added at different concentrations to both cultivars of mango puree, cv.Keitt (Spanish mango) and cv.Nam Dok Mai Si Thong (Thai mango).

Materials and Methods
Six pure substances found in mango fruit such as glucose, sucrose, citric acid, malic acid, starch and cellulose (Fluka Analytical, Germany) were used in this study.Purees of mango fruit cv.Keitt were prepared using a blender (Moulinex, UK).After that it was mixed with six pure substances at various concentrations; glucose, sucrose, citric acid and malic acid were at 0, 5, 10, 15 and 20% w/w; starch and cellulose were at 0, 2.5, 5, 7.5 and 10%w/w.The air bubbles in the mixtures were reduced by stirring with magnetic stirrer for one hour.One gram of the sample was placed in the rotating cup with gold reflection plate.The NIR reflectance spectra were measured in the wavelength range 400-1100 nm using a NIRSystem 6500 (FossNIRsystem, Silver Spring, USA) equipped with a spinning module.Then the pure substances spectra were recorded in the same conditions as mentioned above.To study the effect of mango cultivar on NIR absorption band identification, the mixture of Thai mango puree cv.
Nam Dok Mai Si Thong and six pure substances were prepared.The spectra were then measured in the same process and with the same NIR instrument obtaining different sample cells, the cuvette cell using transportation module.The pure substances spectra were also measured in similar wavelengths to indentify the absorption band.The spectral data of each substance mixture was separated into two sets, calibration and validation.Then they were treated by mathematical techniques such as standard normal variate (SNV), multiplicative scatter correction (MSC) and Savitzky-Golay second derivative, to extract important data.After that the PLSR was applied on the calibration set to develop a model for predicting concentration of pure substances added in the puree.Their performance was tested by the validation set.The best model obtained high value of coefficient of determination (R 2 ), and low values of SEC and SEP was selected.The data analysis was done by using the Unscrambler software (version 9.8, CAMO, Oslo, Norway).

Spectra of Whole Mango Fruit, Puree and Mixtures
Mean original spectrum of the whole mango fruit showed a clear peak in the visible light region at 476 nm and in the NIR region at 978 nm.Spectra of puree and mixture showed a similar pattern to the mean whole fruit (Figure 1).Moreover, their absorption value on mixture spectra decreased when the percentage of pure substance increased (data not shown).However, whole mango fruit spectrum obtained a higher absorption value than mango puree and mixture.Then the spectral data were treated with Savitzky-Golay second derivative (10 nm averaging for left and right side).After that three negative peaks at 746, 842 and 966 nm were found on the whole mango fruit treated and two negative peaks at 490 and 968 nm were found on puree and mixture treated spectra (Figure 2).Moreover the water absorption band shifted from 978 nm to 966 nm.These included two weak absorption peaks at 746 and 842 nm on the whole mango fruit spectrum.Before selection of the suitable wavelength (variables) for model development the absorption band of each constituent should be identified.The different spectra were investigated whereas the pectral data in wavelength range 400-700 nm were not included to omit an effect of pigment absorption band, carotenoid and chlorophyll.
Then the PLSR calibrations were developed.

Spectra Differentiation
The different of mean original spectra were gotten by subtraction between puree and mixture spectra of each constituent.Their glucose, sucrose, citric acid and malic acid showed the negative absorption band at 966 nm.Whereas starch and cellulose obtained positive absorption band at 980 nm (data not shown).The different spectra did not show any peak except the one as described above.So, the PLSR calibration development had to be continued.

PLSR Calibration
To extract the information on spectra, the various mathematical techniques, multiplicative scatter correction (MSC), standard normal variate (SNV) and Savitzky-Golay second derivative, were used to transform the spectral data for reducing the baseline shift, overlapping peak and scattering effect.Then wavelength range (absorption band of pure substances) was initially determined by examining the PLSR model performance.Models were then developed to identify wavelengths that related to each component.The PLSR calibration model statistics for six substances were shown in Table 1.It was found that second derivative spectra in wavelength range of 800-1000 nm was appropriate for PLSR calibration.The R 2 were 0.99 with values of SEC and SEP less than 0.5%w/w.Only two variables were used in the model.The regression coefficient of PLSR models of the mixture between each substance and puree showed highly positive and negative values at various wavelengths.
Usually the method to identify NIR absorption bands by measuring pure substances spectra is well known.Then six pure substances spectra were scanned in the same condition as the mixture.Their original spectra obtained some clear peaks.After that the regression coefficient of PLSR models of mixture were plotted together with each pure substance spectrum and then wavelength identification was done (Figure 3).
Glucose spectrum obtained one clear peak at 824 nm, but regression coefficients of the glucose mixture obtained a high negative value at wavelengths 820, 922 and 942 nm (Figure 3a).Meanwhile, the sucrose spectrum showed four obvious peaks at 756, 918, 984 and 1026 nm, but the regression coefficient of sucrose mixtures showed high values at 914, 940 and 976 nm (Figure 3b).The citric acid spectrum displayed four peaks at 780, 898, 1008 and 1070 nm but regression coefficient of citric acid mixture displayed high values, 902, 934 and 976 nm (Figure 3c).
The malic acid spectrum presented only three peaks at 780, 902 and 1014 nm, but the regression coefficient of the malic acid mixture presented high values at 842, 860, 902, 934 and 976 nm (Figure 3d).The starch spectrum exhibited peaks at 758, 922 and 992 nm, but the regression coefficient of the starch mixture exhibited high values at only two wavelengths, 922 and 988 nm (Figure 3e).Finally, the cellulose spectrum obtained three broad peaks at 780, 924 and 1020 nm, meanwhile the regression coefficient of the cellulose mixture obtained five peaks at 806, 826, 850, 918 and 970 nm (Figure 3f).Moreover, both of substance spectra and mixture regression coefficient plots shared peaks at some wavelengths, such as glucose and its mixture with a peak around 820 nm, sucrose and its mixture with peaks at 914 and 976 nm, citric acid and its mixture with peaks at 902 and 934 nm, malic acid and its mixture with peak at 902 nm, starch and its mixture with peaks at 922 and 988 nm, and cellulose and its mixture with peak at 918 nm peak.

Mango Fruit cv. Nam Dok Mai Si Thong
To prove that wavelength identification and selection using the regression coefficient of PLSR models is accurate and to study the effect of mango cultivar on these processes, mango fruit cv.Nam Dok Mai Si Thong were used in this experiment.Mango puree and mixtures between puree and six pure substances were prepared using the same process as mango cv.Keitt.The spectra in wavelength 700-1100 nm were measured using the NIRSystem 6500 obtaining transportation module with cuvette cells (10 mm) in reflectance mode.It was found that the absorbance of mango puree cv.Nam Dok Mai Si Thong was higher than cv.Keitt, packing in rotating cup with reflectance mode, in whole wavelength (400-1100 nm) tested (Figure 4).Their different of mean original spectra exhibited similar tend with mango cv.Keitt.Whereas it acquired light scattering higher than cv.Keitt (spectra not shown).
Then the PLSR calibration had to be developed.

Nam Dok Mai Si Thong Keitt
Then the spectral data were treated by Savitzky-Golay second derivative (10 nm averaging for left and right side) before developing the models.The spectrum of mango puree showed only peaks of carotenoids and water at 508 and 964 nm, as mango puree cv.Keit.The PLSR was used to develop the models and their result was shown in Table 2.The R 2 were 0.99.There were only two factors used in all PLSR models.The models obtained an SEC and SEP were lower than 0.50 excepted glucose and malic acid.Meanwhile bias were almostly equal to zero.The regression coefficient of PLSR model of mixtures between each substance and mango puree cv.Nam Dok Mai Si Thong, were high values at the same wavelengths as mango cv.Keitt.Their glucose mixture concerned the high value at similar absorption band of pure substance, 1006 nm, sucrose were at 916 and 987 nm, citric acid were at 810, 894 and 1056 nm, malic acid was at 900 nm, and starch were 922 and 976 nm, respectively.In case of cellulose mixture which did not show any value to relate with the absorption of pure substances (Figure 5).Therefore, the mango cultivars had no effect on wavelength identification by PLSR model.

Discussion
Mean original spectrum of the whole mango fruit showed a clear peak of water absorption at 978 nm (Saranwong et al., 2003;Theanjumpol et al., 2010).Since water is the main component (80%) and therefore this absorption band is the strongest (Kawano, 2002).Another peak at 476 nm corresponds to the absorption of carotenoid (Berrardo et al., 2004) which is the main pigment in a ripe mango appearing yellow in color (Chen et al., 2004;Ornales-Paz et al., 2008).Moreover, spectra of puree and mixture showed a similar pattern to the mean whole fruit, which also obtained a water absorption band at 976 nm and carotenoid absorption band at 460 nm.Because mango fruits are composed of water and carotenoids at approximately 81.71 g.100g -1 and 51.2 μg.g -1 , respectively (Litz, 2009).Water molecules has absorption band in short NIR regions at 770, 960 and 970 nm, and can interfere with the absorption bands of other chemical components (Mohsenin, 1984;Contal et al., 2002).The result exhibited that whole mango fruit spectrum obtained a higher absorption value than mango puree and mixture since there are differences in the optical path length.The spectra of whole mango fruit were measured in interactance mode, in which light could penetrate beneath surface of fresh fruit more deeply than reflectance mode used for mango puree and mixture.When consideration on the spectra differentiation, there exhibited the negative water absorption band at 966 nm on glucose, sucrose, citric acid and malic acid mixtures spectra due to the difference in the number of absorbing molecules.The negative band would be progressively larger at increasing pure substance percentage, since the number of water molecules would progressively decreased (Giangiacomo, 2006).Whereas their starch and cellulose displayed at 980 nm which was the complementary effect of water, starch and cellulose absorbing molecules (Contal et al., 2002).
The PLSR calibration model of six substance mixtures obtained the preferable result in spectral data range 700-1100 nm, high value of R 2 , and low values of SEC, SEP and bias.It used only two variables.Because sources of an error, mango fruit structure such as texture, peel, shape, size, and others were reduced (Miyamoto, 1998;Wills et al., 2007).Then the regression coefficient of PLSR calibration models of mixture were plotted together with each pure substance spectrum and were used to identify wavelength.The results concerned that peak appeared on substance spectrum but it was not certainly found on the regression coefficient plot of mixture.This showed that the regression coefficient plot obtained more information than the substance spectrum since it included a matrix (other compostition and texture of puree) like the mango fruit.Then it is highly related to the absorption of the mixture which was useful for wavelength selection.Moreover, both of substance spectra and mixture regression coefficient plots shared peaks at some wavelengths which proved that the substances are from the same group, for instance, citric acid and malic acid (organic acid) absorbed same wavelength at 902 nm, and starch and cellulose (carbohydrate) absorbed same wavelength at 918 nm.However, different groups of substances could absorb different wavelengths.This can be explained by their molecule structures being composed of different chemical bonds such as C−H, O−H and C=O and the number of chemical bonds also effects an absorbance (Burns & Ciurczak, 2001;Osborne et al., 1993).Almost of the peaks were in wavelength range 902-988 nm, which includes water, carbohydrates and acid absorption bands due to their chemical bonds vibrating in various ways, such as OH stretching (second overtone of OH stretch at 960 nm) and CH stretching (third overtone stretch at 910 nm) (Williams & Norris, 2001;Jamshidi et al., 2012;Zude et al., 2006).As up to 20% by weight of each substance was added to the puree, the calibrations were expected to rely heavily on the water content of the mixture and indeed wavelengths between 970 nm to 988 nm showed that water is the main contributor to the PLSR models for glucose, sucrose, cellulose, citric acid and malic acid.These result proved that wavelength selection using the regression coefficient of PLSR models obtained better results than only using the pure substance spectrum.
In case of mango fruit cv.Nam Dok Mai Si Thong, their mixture was similar to cv.Keitt but the whole absorbance was higher than cv.Keitt.Since, the reflectance mode provides a shorter optical pathlength under the surface, it generates smaller absorbance according to Beer Lambert's law (Williams & Norris, 2001;Contal et al., 2002).The R 2 of PLSR calibration model was equaled to cv.Keitt, 0.99.Moreover, the values of SEC, SEP and bias were also low but they were slightly higher than mango puree cv.Keitt.Since the variation of initial chemical substances in cv.Nam Dok Mai Si Thong mango was higher than the one for cv.Keitt mango, there is a higher resulting variation of chemical substances in the mixed cv.Nam Dok Mai Si Thong puree (Suwapanich, 2006;Hofman et al., 1997).The PLSR models predicted the concentration of pure substance added in the puree, not the overall concentration of components after mixing.From the regression coefficient of PLSR model of mixtures between each substance and mango puree cv.Nam Dok Mai Si Thong, there were high values at the same wavelengths as mango cv.Keitt.Therefore, the mango cultivars had no effect on wavelength selection by PLSR model.The most effective wavelength for glucose and sucrose were 900-1000 nm, citric acid and malic acid were 800-1000 nm, starch was 900-1000 nm and cellulose was 800-1000 nm.

Conclusion
The wavelength (variables) selection for NIRS evaluation mango fruit quality could not be done by using only measured the spectrum of pure substance.Since the absorption band of major substance in the mixture shifted which is the characteristic of the compounds tested.Moreover, their regression coefficient plot of PLSR model contained the useful data for the development and future analysis by NIRS.However, the cultivars of mango had no effected on wavelength selection by PLSR model.The suitable wavelength for glucose and sucrose were 900-1000 nm, citric acid and malic acid were 800-1000 nm, starch was 900-1000 nm and cellulose was 800-1000 nm.
predicted value of sample in the calibration set : measured value of sample in the calibration set : number of samples in the calibration set SEP ∑ ŷ ŷ : NIR predicted value of sample in the validation set : measured value of sample in the validation set : number of samples in the validation set Bias ∑ ŷ ŷ : NIR predicted value of sample : measured value of sample : number of samples in the calibration or validation set

Figure 1 .Figure 2 .
Figure 1.Means original spectra for whole mango fruit, puree, and mixtures of six pure substances; glucose, sucrose, citric acid, malic acid, starch and cellulose

Figure 3 .
Figure 3. Pure substances spectra and regression coefficient plot of PLSR models of glucose mixture (a), sucrose mixture (b), citric acid mixture (c), malic acid mixture (d), starch mixture (e) and cellulose mixture (f) of mango puree cv.Keitt

Figure 4 .
Figure 4. Means original spectra of mango puree cv.Nam Dok Mai Si Thong and cv.Keitt in wavelength range 400-1100 nm

Figure 5 .
Figure 5. Pure substances spectra and regression coefficient plot of PLSR models of glucose mixture (a), sucrose mixture (b), citric acid mixture(c), malic acid mixture (d), starch mixture (e) and cellulose mixture (f) of mango puree cv.Nam Dok Mai Si Thong

Table 1 .
PLSR calibration statistics for mango puree cv.Keitt mixed with six substances at various concentrations between 0 and 20% (w/w) F: number of factors used in the PLSR calibration model; R 2 : coefficient of determination; SEC: standard error of calibration (%w/w); SEP: standard error of prediction (%w/w); Bias: average of difference between actual value and NIR predicted value (%w/w).

Table 2 .
PLSR calibration statistics for mango puree cv.Nam Dok Mai Si Thong mixed with six substances at various concentrations between 0 and 20% (w/w) F: number of factors used in the PLSR calibration model; R 2 : coefficient of determination; SEC: standard error of calibration; SEP: standard error of prediction; Bias: average of difference between actual value and NIR predicted value.