A Comparison of Using Dominant Soil and Weighted Average of the Component Soils in Determining Global Crop Growth Suitability

Soil parameters represent key data input for crop suitability analysis. Soil databases are complex offering soil mapping units made up of various component soils. In the case of the Harmonized World Soil Database there can be up to 8 component soils per unit. In roughly 1/3 of soil mapping units, the additional component soils take up more than 50% of the pixel share value. The soil parameter value estimate, such as pH, salinity and organic carbon content, may differ between the value of the dominant soil component and the weighted average of the values of all component soil. Understanding the effect of these differences on crop model outputs may allow quantifying the error. In this study, we show the changes in crop suitability of 15 crops while using the parameter value estimates of the dominant soils versus a weighted average of the component soils. In the case of the latter, global crop suitability amounts to 54.5% of the earth’s land surface–1% more than when using the values of just dominant soils. Intrinsic regional differences in the quality of the soil database influence the distribution of crop suitability classes especially in areas where share values of the dominant soil are low. The uncertainty range for the use of dominant versus component soils on the overall global crop suitability could be considered to be 1%, while that of each suitability class can amount to up to 4%.


Introduction
Ensuring food security for the global population is already challenging in current times and will be even more, when population rises up to around 8.3 billion by 2030 (UNDP, 2008).Enhanced food production relies on three factors: increased yield, enhanced cropping intensity and the expansion of agricultural land (FAO, 2003).In 2009, the total amount of agricultural and permanent crops amounted to 2.5 billion ha which equals about 19% of the earth's land surface (Bontemps, Defourny, Van Bogaert, Arino, & Kalogirou, 2009).In the last four decades of the past century, 172 million ha of land have been added in developing countries (FAO, 2003).To ensure global food security, an additional 120 million ha of converted land are projected to be necessary until 2030 and an extra 5% will be necessary up to 2050 (Bruinsma, 2009).Most land is expected to be transformed in South America and Sub Saharan Africa (Fischer, 2000).Models based on climate and soil inputs can help discern the areas where crops can grow optimally for given natural conditions.Fischer et al. (2002) showed that roughly 2.8 billion ha are to some degree suitable for rain-fed agriculture and Avellan, Zabel, and Mauser (2012) showed that about a quarter of the earth's land surface is suitable to highly suitable for the rain-fed growth of 15 major crops (Avellan, Zabel, & Mauser, 2012;Fischer, 2002).Both authors base their different models (Global agro-ecological zones versus fuzzy logic crop suitability) on global soil and climate databases.However, global soil databases are scarce and rely on patchy soil sampling.Few sets exist, such as the Harmonized World Soil Database (HWSD) (FAO/IIASA/ISRIC/ISSCAS/JRC, 2009) and the ISRIC-WISE derived soil properties on a 5 by 5 arc minute grid (Batjes, 2006).Global Climate Datasets are more varied.Past climate data can be obtained from interpolated station data (WorldClim), reanalysed forecasts (ERA) or hind-casted climate models (ECHAM, HadCM).Avellan et al. (2012) showed that the quality of climate inputs is quite homogenous while global soil databases can differ widely.The choice of the database can have a strong effect on the amount and distribution of crop suitable areas, leading to a 10% difference between the two most common global soil datasets (Avellan et al., 2012).Soil databases are immensely complex and the quality of the data is geographically diverse.For example, the HWSD is made up of four different input databases-each covering different areas of the world, using different sampling and compilation methods (FAO/IIASA/ISRIC/ISSCAS/JRC, 2009) (see Figure 1).Each pixel can contain up to 8 component soils which may, in sum, have a larger share within the pixel than the dominant soil class (see mock up example in Figure 2).When taking component soil classes into account, the soil parameter value estimate for each given pixel may be different than that of the dominant soil mapping unit (i.e.dominant soil value for pH is 8, but that of the weighted average of all component soils is 7.8).
In order to enhance modelling results a balance between the quantity and quality of the used input parameters has to be maintained.While more parameters might refine the modelling results, poor quality parameters might, in fact, be counterproductive.A careful analysis of both the quality of the data as well as their influence on final results might inform the choice of parameters.In Avellan et al. (2012), we started our crop suitability analysis using only the parameter value estimates of the dominant soil mapping unit of the topsoil (0-30 cm) on a pixel by pixel basis.In comparison, the Global Agro-ecological zones studies, used soil parameters from all component soils, top-and subsoils (0-30 cm and 30 cm and below), phases as well as management practices (IIASA/FAO, 2012).It is clear to the authors that other parameters relevant to soil databases such as subsoil parameters (30 cm and below), including drainage, granularity or acidity, as well as phases and management practices can have drastic effects on crop growth (Benjamin, Nielsen, & Vigil, 2003;Kirchhof et al., 2000;Van den Akker, Arvidsson, & Horn, 2003).
To our knowledge, the use of parameters in crop suitability models has not been substantiated by the analysis of the quality of the data.The inclusion of factors is defended by referring to standard works (i.e.FAO manuals (FAO, 1976(FAO, , 2007) ) or similar) without questioning the validity of the usage.It is our intent to enhance model complexity in a step-by-step approach while showing the error margins incurred.Analogous to the well-known uncertainty ranges of climate models we wish to demonstrate a similar approach in the use of crop suitability estimations.Here, we assessed the influence of the area-weighted average of the additional component soils of the soil mapping units of the topsoil, on the amount and distribution of crop suitable areas.
Regions were defined for their economic relevance in global trade as a biophysical crop model was coupled to a Global Equilibrium Model in a subsequent step (Table 1).

Dominant vs. Component Soil Areas and Soil Parameter Value Estimates
Dominant soil is defined as the HWSD component soil with the largest share value irrespective of the fact that the other component soils together may have a larger share within one pixel.Soil parameter value estimates are the values each pixel has for a chosen parameter, i.e. pH, salinity, etc.In Figure 2 we have tried to show in a mock-up example how a pixel can be made up of several component soils and the effect the weighted average has on the parameter value estimate.
Figure 2. Mock-up examples of two pixels with different distributions of component soils (left); effect of using the weighted average on the overall parameter value estimate versus using that of the dominant soil (right) We used GIS techniques to determine the area of prominence of dominant soils and compared it in size to that where component soils had higher percentages.We used Mondrian (version 1.2), an open source statistical analysis tool (University of Augsburg, 2012), to study the distribution of dominant soil units and component soil units.For the spatial representation of the soil units, a FORTRAN program was designed that allowed assigning the soil unit share to each pixel.

Determination of Crop Suitable Areas
We used the fuzzy logic approach as discussed in Avellan et al. (2012).Fuzzy classification methods define growth through membership functions and likelihoods (Burrough, MacMillan, & Deursen, 1992).The rationale behind this is that most soil parameters have a large error rate per se, due to sampling and handling errors, and crops are able to grow at various levels of these parameters (Rossiter, 1996).Thus strict Boolean classification systems may be too restrictive in growth ranges and areas.Fuzzy logic approaches have been used for a selected number of crops on limited study areas by other authors e.g.(Baja, Chapman, & Dragovich, 2002;Braimoh, Vlek, & Stein, 2004;Reshmidevi, Eldho, & Jana, 2009;Van Ranst, Tang, Groenemam, & Sinthurahat, 1996).
Raster-based soil, terrain and climate parameter values were matched on a sliding scale from 0 to 1 with their respective crop growth likelihoods as determined by (Sys, Van Ranst, Debaveye, & Beernaert, 1993) (Figure 3a).Subsequently, the most optimally matching crop was selected to be the most suitable for a given pixel.Each component soil was assigned one fuzzy value (Figure 3b).Depending on the number of component soils in each soil mapping unit, up to 8 fuzzy values per pixel were assigned.These were aggregated based on their weighted share value of the respective soil mapping unit.Component soils with high share values end up with a stronger influence on the final fuzzy value.
Crop growth abilities were then categorized into four subsets as defined by Sys et al. (1993) and (FAO, 1976).Fuzzy value between: 1) 0-0.4 Pixel not suitable for crop growth (N) (none).
Pixels are subsequently transformed into land surfaces according to their location on the globe through a FORTRAN programme.The total land surface is considered except Antarctica.

Dominant vs. Component Soil Areas
In 64% of all pixel the dominant soil holds more than 50% of the pixel's share value.When looking at specific major soil groups, some only exist as dominant soil types (i.e.Is-Lithosols, Ns-Nitosols, U-Rankers and W-Planosols).Most soils comprise only two component soils in their soil mapping unit (i.e.dominant soil plus one additional component soil).Few cases exist where soil mapping units have 6 or more component soils.The share value of the dominant soil component is very high in most of northern Asia, Greenland, the North America and large parts of Africa.These are areas where the dominant soil defines the parameter value estimate (grey areas in Figure 4).In the case of China, due to the way the database was produced, only one-the dominant-soil exists.In the Middle East, Central Asia, the Pacific and Australia, share values of the dominant soil component were very low.These are areas where the other component soils play a larger role in determining the parameter value estimates of the given pixel (black areas in Figure 4, see also mock up example in Figure 2).South America exhibits mostly areas with intermediate share values (data not shown explicitly).

Determination of Crop Suitable Areas
While using the parameter value estimates of the dominant soil mapping units along with climate and terrain constraints, 9% of the earth's surface result in highly suitable (S1), 25% in suitable (S2) and 19% in marginally suitable (S3) areas (Figure 5).Barley (10.7%), wheat (5.6%), and oil palm (5.2%) are globally the most suitable crops (Figure 6) (Percentages of overall pixel, not of area).
While considering the parameter value estimates of all component soils in a given pixel, the area suitable for crop growth amounts to 54.5% of the earth's land surface excluding Antarctica.Roughly 4.5% can be categorised as highly suitable (S1); 27% and 23% can be classified as suitable (S2) and marginally suitable (S3), respectively (Figure 5).The most prominent crops were the same as when using dominant soils only, with adjustments in their overall percentages (barley-11.1%,wheat-6.5%,oil palm-5.9%)(Figure 6).Partnership, 2011).In few cases of crop modelling some authors have undertaken extensive quality control of the underlying soil data and adapted it to their needs (Gijsman, Thornton, & Hoogenboom, 2007;Romero et al., 2012).This is very cumbersome and can only be carried out when sufficient expert staff is available for a specific target objective.However soil datasets are used widely by differing disciplines.We suggest explaining the inherent uncertainty attached to these datasets and lay open the error margin of their use.In this particular case, on the use of all component soils versus only the dominant soils we postulate that the error margin is of about 1% at a global scale.
It is clear to the authors that additional parameters can be used from the soil databases as well as a variety of other parameters such as refined climate datasets, in particular at the temporal scale.Knowledge on ethnicity, gender, management practices, adapted crops, irrigation, use of fertilizers and of the use of technology are all factors that influence the suitability of an area for agricultural purposes (FAO, 2007).Obtaining reliable data for these parameters may be even more challenging than for soil databases.

Conclusion
In this study, we intended to show the differences in model results when using all component soils for the analysis of crop suitability.This is important because it allows determining the level of uncertainty that modellers face when using current global soil databases.Including more parameters does not always mean better results.We showed that the distribution of the number of component soils of the HWSD is very heterogeneous on a geographical scale but is not linked to the quality of the underlying data subset.The error range for using either the dominant component soil versus all component soils could be considered to be 1%-the difference in crop suitable area between the two datasets.The margin of error varies according to the region and increases to up to 4% when looking at the individual suitability classes.

Figure 1 .
Figure 1.Distribution of the four underlying databases of the Harmonized World Soil Database (HWSD); European Soil Database (ESDB), Soil Map of China (CHINA), Soil and Terrain dataset (SOTWIS), Digital Soil Map of the World; adapted from (FAO, IIASA, ISRIC, ISS-CAS, & JRC, 2009) Overview of the methodology of fuzzy logic crop suitability analysis using just the parameter value estimates of a) the dominant soil (top) or b) of all component soils (bottom)

Figure 4 .
Figure 4. Analysis of shares and sequences of component soils.Grey areas represent soil mapping units where the share value of the dominant soil component holds more than 50%; Black areas are regions where the dominant soil component holds a share value of more than 50% Figure 5 soils (

Table 1 .
Coding of the regions SEAKambodscha, Laos, Thailand, Vietnam, Myanmar, Bangaldesh USA United States of America Region specific changes in crop suitability areas by categories using dominant soil parameter value estimates (d) or component soils (c).S3-marginally suitable, S2-suitable, S1-highly suitable Now, how to make a choice of which dataset to use?The quality for all component soils is heterogenous; the effect on the extent and type of crop suitability minimal.The lack of consistent quality of global datasets is a known issue.A variety of research centres are working towards enhanced soil datasets and sampling, often in collaboration with many others such as in the Global Soil Initiative launched in 2011 (The Global Soil