Exploratory Analysis of Sediment Geochemistry to Determine the Source and Dispersion of Ba , Fe , Mn , Pb and Cu and in Chihuahua , Northern Mexico

Exploratory Data Analysis (EDA) was applied to a sediment geochemistry data set (N = 2,584) within the Chihuahuan Desert in northern Mexico. Each data point contained information on the location, concentration of 19 elements, and lithology. A Box Plot analysis was applied to the data to determine mild and extreme anomaly thresholds. Anomaly maps revealed areas of concentration of anomalies as well as dispersion patterns for each element; Ba was the most mobile of the five elements and Pb the least. Specific potential sources of contamination were identified visually after superimposing anomalies with map layers of mines, urban centers and/or hydrology. The patterns revealed that mining was an important source of Pb, Fe, Mn and Ba contamination, while copper instead was associated to the presence of Tertiary volcanic rocks that abound in the western edge of the study area. Multi-element exploratory techniques yielded four principal components that accounted for 74% of total variance, exposing a strong association among Fe, Mn, P and Mg in PC1, Ca and Sr in PC2, Co, Cu, Cr, and Ni in PC3 and Pb in PC4. By identifying potential contaminant sources, EDA shed some light into the nature of the contamination, if natural or anthropogenic, a critical piece of information to impact assessment studies and best management plans.


Introduction
The large amount of soil and sediment data contained in the North America Soil Geochemical Landscapes Project undertaken by the Geological Surveys of the U.S., Canada, and Mexico, provides a good data coverage in these participating countries (Chiprés et al., 2009b).Although these data bases were originally intended for exploration of mineral resources, they provide inexpensive and accessible information for geochemical and environmental studies.These data sets are of special value to areas of harsh climate and limited accessibility such as deserts, and also to areas with limited economic resources due to the considerable cost incurred in sample collection and analysis of multiple samples that are needed to conduct a regional-scale study (Bounessah & Atkin 2003;Chiprés, Castro-Larragoitia, & Monroy, 2009a;Oyarzún et al., 2012).The use of sediment data bases offers the following additional advantages to environmental studies: (1) Generated maps of mild and extreme anomalies greatly enhance the identification of problem areas, (2) The methodology utilized in sampling and analytical analyses in geochemical datasets of national or international programs are standardized, strengthening the reliability of results obtained in the comparison among samples and statistical applications, and (3) Many potentially toxic elements are present at such low concentrations that special analytical procedures are required, e.g., sample concentration and/or use of sophisticated equipment.By taking advantage of their known association to elements which are easier to detect, specific areas where the toxic element may concentrate can be singled out.
Numerous geochemical studies have been conducted at a regional level, among them Ludington et al. (2006) in the U.S. northern Great Basin, Zumlot, Goodell, and Howari (2009) in New Mexico, and Lapworth et al. (2012) in Nigeria.Other studies have concentrated instead in a smaller area; for example, the characterization of mineralized areas in northeast Algeria (Bounessah & Atkin, 2003) and Catorce-Matehuala in northern Mexico (Chiprés et al., 2009a).
A combination of spatial analysis and exploratory data analysis (EDA) of sediment geochemical data has been applied with success to analyze the spatial distribution of contaminants in arid environments (Bounessah & Atkin, 2003;Oyarzún et al., 2012).This method has proven to produce more representative threshold values (Carranza, 2009;Chiprés et al., 2009a) and to generate results that are less affected by the presence of outliers.Statistical analyses of multi-element systems provide an excellent complement to these studies by determining the degree of association between elements (Zhang & Selinus, 1998;Carranza, 2009), separating samples according to origin (Santos et al., 2005) and validating threshold determination methods (Reimann, Filzmoser, & Garrett, 2005a;Reimann & Garret, 2005b).
In the present study, EDA and spatial analysis methods were applied to a dataset of sediment geochemistry covering a representative fraction of the Chihuahuan Desert in order to (1) determine the spatial distribution of background and anomalous concentrations of Ba, Fe, Mn, Pb and Cu, (2) identify possible sources of the anomalous concentrations as well as their dispersion patterns, and (3) evaluate significant associations among these and other elements.

Study Area
The Chihuahuan Desert is the second largest desert in North America, extending over several states in northern Mexico and into the U.S.This study focuses on a portion of the Chihuahuan Desert comprised within the state of Chihuahua, which encompasses 181,000 km 2 (70,000 sq.mi.) of land of relatively homogeneous topography and climate (Figure 2), The study area lies on a high plateau (800 to 1675 m above sea level) that receives an annual precipitation between 250 and 370 mm (10-12 in) and experiences extreme daily and seasonal temperature variations.
This area belongs to the basin and range geological province.During the Tertiary, this area was affected by extensive volcanism, which emplaced a large volume of rhyolites, ignimbrites and tuffs (McDowell & Clabaugh, 1981;Valencia-Moreno & Bryan, 2007).The exposed rocks include igneous (mainly Tertiary volcanic rocks and a small amount of non-volcanic rocks such as granodiorites, trachytes, and dacites), and sedimentary rocks (Jurassic and Cretaceous limestones and shales, and a smaller amount of Paleozoic-age limestones and shales), as shown in Figure 3.After grouping these different rocks into four major lithologies, Tertiary volcanic rocks covered 28.3% of the study area, Jurassic and Cretaceous sedimentary rocks 11.7%, and igneous non-volcanic and Paleozoic sedimentary rocks 2.6%.The rest of the area (57.3%) was covered by Quaternary alluvium.As can be seen in Figure 3, rock units alternate with alluvium in roughly NW-SE trending lines, with Tertiary volcanic rocks more prevalent near the western edge of the study area.Three large cities are located within this area, Juarez (pop.1.4 million), Chihuahua (pop.850,000), and Delicias (pop.130,000).Other towns include Casas Grandes, Ojinaga, and Aldama.Two rivers cross the study area, the Rio Conchos traversing it, and the Rio Grande along the US-Mexico Border.These rivers, their tributaries and other smaller ephemeral streams recharge aquifers, supply water for irrigation and for cattle and other grazing animals, and receive wastes (treated and untreated) from cities and communities along their path.As mentioned above, natural resources are being depleted at a fast rate and widespread environmental deterioration is evident in the region, significantly affecting grassland, riparian areas, and aquatic environments (Kelly & Arias Rojo, 2007;De la Maza, 2009).This situation has prompted governmental agencies to designate several areas within the study area as areas of concern or "regiones prioritarias" (CONABIO, 2012).
Historically, mining and cattle growing have been major economic activities for the region.From the first Europeans that arrived ca.1528, Chihuahua started a long history of mining; i.e., gold, silver, lead, iron, perlite and barite.Most mines within the study area are of hydrothermal deposit type (Clark & De la Fuente, 1978).Although many mines have been abandoned, some remain operational, among them San Antonio (Ag, Au, Pb), Bismark (Pb, Zn), Naica (Pb, Zn), La Perla (Fe), Las Fortunas (Ba), and Cerro Prieto (Ba) (Clark & Fitch, 2009;Gutiérrez & Carreon, 2009).Soils in the vicinity of Pb, Fe, and Ba mineralizations and/or their mine tailings were expected to show up clearly as anomalies within the study area.For this reason, these elements together with their associated elements Cu and Mn, were selected as target elements for this study.

Data Acquisition
Geochemical data were acquired from the Servicio Geologico Mexicano (SGM), and a smaller number of samples were collected by the authors.ArcMap TM was used to obtain and display the data point coverage.Samples belonging to Chihuahuan Desert were extracted according to desert boundary layer defined by the Terrestrial Ecoregion of México coverage (CONABIO, 2008), resulting in a data set of 2,925 sampling points.
The original geochemistry layer reported concentrations of major elements in % and for minor and trace elements in ppm.This data had to be put in a format consistent with the analysis.First, concentration values reported as % were converted to ppm (mg/kg solid), then elements that lacked data (concentration reported as its limit of detection) were removed, and the resulting data were then log transformed.Further, the data were divided into groups according to the lithology by intersecting the point coverage with the simplified geologic layer previously constructed.It is important to mention that all samples were composed of sediments and none of solid rock, but they were labeled after the lithology that was more closely associated with the sediment (upstream nearest outcrop).The four groups were, in order of larger to smaller: Quaternary alluvium (alluvium), Tertiary Volcanics (volcanics), Jurassic and Cretaceous sedimentary rocks (sedimentary) and other (plutonic).The latter group included everything else that did not belong to any the first three groups, and consisted of a small number of samples derived from a wide variety of rocks, mainly of plutonic type.Descriptive statistics by Box Plot were applied to each group in order to discriminate between background, mild anomalies, and extreme anomalies within each lithology.The resulting threshold values were back transformed.

Exploratory Data Analysis
Exploratory data analysis (EDA) is a technique developed in the 1980s to improve the interpretation of geochemical data (Kürzl, 1988).EDA has since then been effectively utilizedto analyze large data sets of sediment/soil geochemistry, whichby nature hold a close dependency to geology and thus tend to be non-normally distributed.EDA overcomes the non-normality limitation by generating instead 'potentially explicable' data patterns (Kürzl, 1988;Zhang & Selinus, 1998;Carranza, 2009;Chiprés et al., 2009b).In addition, EDA permits the grouping of data with respect to parent material and/or soil type (Carranza, 2009;Chiprés et al., 2009a) and the analysis of only selected elements (Zhang & Selinus, 1998;Reimann & Garret, 2005;Carranza, 2009).Common techniques utilized in EDA include density trace, scatterplot and box-plot diagrams (Kürzl, 1988).BoxPlot, also called Tuckey or Whisker diagram, is shown schematically in Figure 3.  (Tuckey, 1977), which can be mild or extreme.

Spatial Representation
Geographical Information Systems (GIS), i.e.ArcGIS TM , constitute an excellent tool to analyze and visualize the spatial distribution of data, and has been used extensively in the analysis of large geochemical data sets (Zhang & Selinus, 1998;Zumlot et al., 2008;Chiprés et al., 2009a, b).The main strength of GIS is that it allows the spatial management of a database where data can be analyzed, grouped or intersected to any other thematic information geographically represented (layers).
This study utilized the following layers: geographic coverage of concentrations for each target element (with three symbols representing each background and the two anomalies), a geographic coverage (map layer) for mines and prospecting, one for geology, one of urban areas, and one of hydrology.The potential sources of contaminants were identified primarily by visual comparison after superposing the anomalous concentrations to the above mentioned layers.For example, anomalies occurring in association with a particular outcrop and plotting all throughout it will be interpreted differently as if they would concentrate in only a few locations of the outcrop.In the latter, anomalies would be signaling a pollution point source, which can be mine tailings or a source directly upstream from the cluster of anomalies.
Finally, layers of hydrology, geology, mining, and urban areas were superimposed onto the anomaly map to discern the features that could explain the presence of the observed anomalies.Given that visual inspection of anomalies is not final validations of the source of the contaminant, additional interpretation of the results within the context of the particulars of the area as well as statistical manipulations are indispensable.

Principal Components
Besides identifying possible contaminant sources, sediment geochemistry studies investigate the relationships between the various geochemical variables in order to identify those variables that play a most important role.Towards this purpose, multivariate statistical analyses such as factor analysis, principal components, and cluster analysis are typically utilized (Zhang & Selinus, 1998;Bounessah & Atkin, 2003;Reimann & Garret, 2005;Zumlot et al., 2008;Oyarzún et al., 2012).
Because of the large difference in concentrations between major and minor elements and the deviation from normality, transformations were also needed prior to applying multi-element statistical analysis.A common procedure includes removal of censored data (values at or below detection limit) followed by a log transformation (Zhang & Selinus, 1998, Carranza, 2009), and standardization of the input data (Carranza, 2009).Similar to the Box Plot analysis, only elements reporting concentrations above detection limits were included in the analysis.The resulting data set comprised 19 elements and 2,584 data points.
To this new data set, logarithmic transformation and standardization procedures were applied as described by Carranza (2009).The standardization algorithm used is show by the equation Where Z ij represents the standardized data values for the lithology group j; X ij the original values i for the lithology group j; MED is the median value from X ij values; and MAD is the median absolute deviation, which is estimated as the median of absolute deviation of all data values from the data median.
The complete data (all 19 elements) were analyzedfor principal component analysis using Varimax Rotated Empirical Orthogonal Function (EOF-Var) using STATISTICA TM .

Box Plot
Back transformed Box Plot results for each lithology are listed in Table 1.Background threshold values entail crucial information for geochemical and environmental impact studies, as these values can be used to differentiate the soils as non-contaminated, mildly contaminated or extremely contaminated.The threshold values changed only slightly among the three major lithologies, as expected since alluvium formed after weathering of the other lithologies, plus they all share characteristics such as topography and climate.In contrast, thresholds for the less represented group labeled as plutonic varied widely, especially Cu and Pb upper fence values.The three major lithologies are discussed in more detail below since together they covered 97.4% of the area.The content of Mn and Pb were higher in sediments classified as volcanic, Ba was higher in sediments derived from sedimentary rocks, while the content oboth Fe and Cu were about the same for sedimentary and volcanic rocks.Besides the influence of parent rock, the observed differences in threshold values among groups reflected differential weathering of these lithologies, which it is known to gradually alter the mineralogy and mobility of elements.Another interesting factshown in Table 1 is a significantly higher number of anomalies for the upper end of concentrations compared to the number of anomalies obtained at the lower end.The box plot diagrams obtained for each target element are shown in Figure 4.
As seen in Figure 4, the median plots near the center for most log boxes, indicating that the log transformation was successful in bringing the data closer to a normal distribution.The exceptions were the Cu data for both volcanic and plutonic lithologies, which remained skewed in spite of the log transformation.

Distribution of Anomalies and Their Possible Sources
The anomalies were plotted following conventional symbols of crosses for the upper end (larger for extreme anomalies and smaller for mild anomalies), small dots for data within whiskers, and open circles for lower end anomalies (Figure 5).
As shown in Figure 5, the number of mild anomalies greatly exceeded the number of extreme anomalies.
Compared to the location of major cities and streams, anomalous concentrations plotted independently from urban centers, and with the exception of Ba, seemed to be unaffected by dissolution and by running water.The fact that only some of the anomalies clustered around known mines indicated that natural sources (e.g., a particular type of rock) were also important in explaining the high concentrations, with a shifting of the balance between these two sources varying with each element, as described below.
For Ba, most anomalies were mild in extent, and corresponded to enriched concentrations of natural sources located near Ba operating mines and/or Ba mineral deposits.The operating Ba mines are located within the Rio Conchos Basin (Gutiérrez & Carreón, 2009).The high solubility of Ba is evident as its anomalies followed the hydrology, downstream of the Rio Conchos and into the Rio Grande.In the adjacent closed basins, Ba anomalies also followed the hydrology, concentrating in the lower parts of the basins.The anomaly maps of Fe and Mn were very similar to each other, which may be explained by their joint occurrence in several mines, especially those located in the northern part of the study area.Extreme anomalies of Fe and Mn were few and plotted within the alluvium, suggesting that these two elements form iron oxides that coat the grains of soil and sediment.Mild anomalies were concentrated around Fe and Mn mineral deposits near operating sulfide mines, i.e., San Antonio mine.Surprisingly, no extreme anomalies of Fe were evident near La Perla, a large iron producing mine in the southern part of the study area where the Fe is found as Fe-oxide mineralization.The absence of anomalous Fe concentrations near this mine may be the result of the methods utilized in the extraction and transportation of the mined material, as the extracted iron is transported via slurry and shipped by pipeline to a smelter in the adjacent state of Coahuila.Fe dispersion by water was not evident from the anomaly maps, as the anomalous concentrations did not follow the hydrology pattern.It is interesting to note a prevalent low Mn content in sediments in the southern part of the study area, evident by the large open circles representing anomalous low concentrations in both alluvium and volcanic lithologies.
Although there are no Cu-producing mines within the study area, there are skarn deposits containing Cu and Zn oxides.One such Co-Zn skarn deposit is located 40 km north of the City of Chihuahua (Parkinson & Price, 2009).Several important operating copper mines are located within Tertiary volcanic rocks that outcrop along the western edge (Sierra Madre Occidental), outside the study area.As shown in Figure 5, few extreme Cu anomalies were located on alluvium in the western margin of the study area, while mild anomalies abound nearby those extreme anomalies.The anomaly threshold values among lithologies were significantly larger for volcanic and plutonic rock exposures, suggesting a natural Cu enrichment in these rocks.
Extreme Pb-anomalies were located in the vicinity of mines containing Pb, Zn, and Ag, and near prospects of sulfide mineralizations.Similar to the Cu anomaly pattern, mild Pb anomalies plot near the few extreme anomalies and over the alluvium.Clusters of mild anomalies are visible in two areas, by Casas Grandes and west of Delicias.In the northwest part of the study area (near Casas Grandes), the anomalies cluster around abandoned Pb-Zn mines, while those near Delicias plot near Pb-Zn-Cu prospects.From the dispersion pattern of the anomalies, Pb seems to be unaffected by the hydrology and dependent on areas containing Pb-Zn mineralization.

Multi-element Statistics
In order to verify that transformations had successfully brought the data closer to a normal distribution, scatterplots and histograms were constructed for each lithology.Scatterplots are another way to present the results obtained by box plot, and are included here only to show a popular option to report the distribution of a data set.For simplicity purposes, only the scatterplots and histograms of alluvium are included (Figure 6). Figure 6 shows the log transformed data following a distribution close to normal.The graphs obtained for the groups alluvium, volcanic, and sedimentary closely approached normality.Within these lithologies Fe and Mg were the least scattered evidencing their strong association.The relative homogeneity of the threshold values among the three dominant lithologies (alluvium, volcanics, and sedimentary) facilitates the separation between contaminated and non-contaminated limiting values.Having one threshold value per element for the whole area may be a valid approximation that would in turn facilitate the implementation of monitoring and environmental protection plans for the area.
The background concentrations for the Chihuahuan Desert are reported here for the first time and are important base values for future environmental and geochemical studies.The small variability of these values among the different outcropping lithologies imply that these values may apply for other portions of the Chihuahuan Desert outside the study area.
Anomaly maps provide a visual product that eases the identification of contaminated areas, and pinpoint to specific areas in which the efforts for containment and remediation should concentrate on.Future changes in land use or exposure of contaminated soil may change the pattern of anomalies, but that new path might be able to be predicted taking into account the mobility of the contaminant and the conditions (e.g., slope) of the location.

Conclusions
An EDA analysis was utilized to determine the thresholds for background, mild, and extreme geochemical anomalies.A simplified geologic map identified four main sediment lithologies within the study area: alluvium, volcanic, sedimentary, and plutonic.The last group was labeled as plutonic as the majority of rocks conforming it were plutonic, but consisted of any lithology that did not belong to any of the first three groups, and was represented by only 62 out of 2,595 sampling points (2.3 % of data).Excluding this last group, the threshold values varied only slightly among lithologies, which would endorse the decision to select one threshold per element for the whole study area with little loss in accuracy.
The spatial distribution of anomalies pinpointed possible sources of contamination and showed spatial dispersion patterns.Mining/prospecting regions were the most important sources for abnormally high concentrations of most target elements, followed by geology.This finding applied in reverse order for Cu.Dispersion patterns were Ba >> Fe, Mn> Cu >Pb.
Multi-element statistical analyses yielded agreeable results with those obtained by either box plot or anomaly maps and provided additional information on the associations among the five target elements and 14 other elements.Principal components PC1 to PC4 accounted for 75% of the total variance and incorporated each of the target elements in a different principal component group except for Fe and Mn which were both part of PC1.
Background values are a useful reference to present and future environmental studies, as they separate normal values from those that denote abnormally high (or low) values.Maps depicting abnormally high values show areas that are either contaminated or that merit additional study.The identification of such areas is an important first step towards remediation of a polluted area, and can also be used as a preventive measure if it is used to identify areas prone to pollution.

Figure 1 .
Figure 1.Location of the study area, showing the sampling point coverage, rivers, and major cities

Figure 3 .
Figure 3. Schematic representation of Box Plot method

Figure 4 .
Figure 4. Log Boxplot results of Ba, Cu, Fe, Mn, and Pb according to lithology

Figure 5 .
Figure 5. Spatial distribution of anomalous concentrations for Ba, Cu, Fe, Mn, and Pb

Figure 6 .
Figure 6.Scatterplot diagrams for log transformed data of the alluvium lithology

Table 1 .
Box Plot results of target elements according to lithology; upper inner fence (UIF), lower inner fence (LIF), and No. of anomalies.The results were back transformed after conducting the analysis using log transformed data 4.3.1 Principal ComponentsPrincipal component analysis using Varimax Rotated Empirical Orthogonal Function (EOF-Var) produced four components that explained 75% of the total variance, as shown in Table2.PC1 grouped Al, Be, Fe, P, K, and Mn; PC2 Ba, Ca, Sr, Mg and U, PC3 Co, Cu, Cr, and Ni, and PC4 Na and Pb.The results agreed with the behavior shown by anomaly maps, as each element appears in a different principal component, except for Fe and Mn which appear in PC1.This technique shows an independent behavior for each element (except Fe and Mn) but groups each of them with other elements with which they do associate with.