A Note on the Normalized Definition of Shannon ’ s Diversity Index in Landscape Pattern Analysis

A common approach for quantifying landscape pattern through landscape metrics is to use categorical maps of entire landscape. However, a new interest is to use sampling data where the data are collected for only a small fraction of the entire landscape. In sample based approaches some currently used landscape metrics may not be estimated since these metrics are defined based on mapped data. Shannon’s diversity index is a frequently used metric in landscape pattern analysis. In this study, the performance of the normalized Shannon’s diversity index is demonstrated when using sampled full-coverage maps and then point sampling on the maps. Artificial and real landscapes have been employed for this purpose. The results showed that calculation of the normalized Shannon’s diversity index based on the number of land cover types in the entire classification system is more appropriate than based on the number of land cover types present within landscape. There was a strong and positive correlation between reference and estimated values but the estimator of Shannon’s diversity index was slightly and negatively biased. In conclusion, it is needed to slightly redefine some currently used landscape metrics to accommodate sampling data.


Introduction
Landscape pattern has received much attention by landscape ecologists over the past decades.A fundamental assumption is that many ecological phenomena such as population dynamics and biodiversity can be affected by landscape pattern (e.g., van Dorp & Opdam 1987;Turner, 2005;Hernandez-Stefanoni & Dupuy, 2008).To understand pattern-process relationships, landscape pattern needs to be quantified first.For this purpose, a set of spatial indices so-called landscape metrics, have been developed (McGarigal et al. 2012) since direct measurement of landscape pattern is difficult (Traub & Kleinn, 1999).Definition of these metrics is based on measurable patch attributes such as the number, size and perimeter length of patches; a patch is defined as a relatively homogenous area which is different from its surrounding (Forman, 1995).Landscape metrics are classified into two general categories (i.e., composition and configuration) which can capture different aspects of landscape pattern (Gustafson, 1998).Configuration refers to the geographical distribution of patches whereas composition refers to the variety and abundance of different land cover types within the landscape.Shannon's diversity index is a typical example of the category of composition.This index is frequently used in landscape ecological studies to describe landscape diversity (e.g., Turner et al., 2001;Ramezani et al., 2010;Corona et al., 2011).
A common approach for quantifying landscape pattern through landscape metrics is to use categorical maps, for instance, land cover/land use maps, of an entire landscape (O'Neill et al., 1988;Wu et al., 2002;Li et al., 2005) and software FRAGSTATS (McGarigal et al. 2012) is usually used for this purpose.However, a competitive alternative is to use sampling data for the estimation of the metrics (Hunsaker et al., 1994;Kleinn, 2000a;Kleinn, 2000b, Loveland et al. 2002, Griffith et al. 2003a, Ramezani et al. 2010, Hassett et al. 2011, Ramezani&Holm 2011, 2012).Sample based approaches takes less time (low cost) and it might be possible to achieve more accuarte result when using a well -designed and -executed sample survey.For instance, Corona et al. (2004) demonstrated that in the estimation of the total length of linear futures within a landscape line intersects sampling method (LIS) appears to be more reliable than the polygon delineation approach (compelet assessment).It was argued that in the LIS procedure, the assessment is carefuly conducted on along sampling nuits (line transects) only.Thus the possibilities of polygon delineation errors can considerably be reduced (Carfagna & Gallego, 1999).In sample based approaches, the data are collected for only a small fraction of the entire landscape, for instance, on a sample of aerial photos (Dramstad et al., 2002;Ståhl et al., 2011), or satellite images (e.g., Hunsaker et al., 1994;Griffith et al., 2003b;Stehman et al., 2003).Kleinn (2000a) demonstrated that some landscape metrics can be derived from field-based forest inventories.Landscape metrics currently in use were originally defined based on full-coverage map data; hence, it is possible that it is not applicable to calculate some metrics using sample based data.Furthermore, some of the metrics may produce similar numerical values for different landscape patterns (Tischendorf, 2001).Thus, development of new metrics is an urgent demand (Kleinn, 2000a) or to slightly redefine currently used landscape metrics for use with sample based data.Recently, a such modification has been made for contagion metric (Ramezani & Holm, 2012) which was adapted for use with point sampling data, as it was originally defined based on the full-coverage maps (O'Neill et al., 1988;Li & Reynolds, 1993).
In this study, the performance of the estimator of normalized Shannon's diversity index is demonstrated when using sampled full-coverage maps and point sampling method on the sampled full-coverage maps.The performance of the index is demonstrated for twenty 1 km 2 real landscapes (with seven land cover types) and two artificial landscapes.

Shannon's Diversity Index (H)
This index refers to the variety and abundance of different land cover types within a landscape (Shannon, 1948;Turner et al., 2001;McGarigal et al., 2012).This index is a general index of diversity and its value, if normalized, ranges from 0 to 1.The value of the index tends to 1 when the land cover types present have roughly equal proportion or a high number of cover/use types actually being present.A low value means that the landscape is dominated by one land cover type.In other words, the landscape has less diversity.The possibility to compare landscapes with a different number of land cover types is an advantage of the normalized Shannon's diversity over the usual definition (i.e., without normalized factor ln (s)).The estimator of H is defined as where s is the number of land cover types and j p is the area proportion of the jth land cover type.In Eq. 1 s can either be treated as the number of land cover types considered in a given classification system (assumed fixed) or the number of land cover types present within the landscape (as a random variable).The j p will be 0 for those land cover types that are not present with ) ln( .
simply set to 0.

Examples
In large spatial scale survey, a combination of field-based inventory methods and remote sensing data are frequently used, for instance, two-stage sampling design (Thompson, 2002;Gregoire & Valentine, 2008).In such design, remote sensing data (e.g., aerial photos) are sampled in the first stage and then in the second stage one of basic sampling methods (e.g., point sampling) is used within each first stage sampling units.Note that sampled remote sensing data at the first stage can be treated as a full-coverage data in the second stage.In this study, 1 km 2 real landscapes was served as a full-coverage data where point sampling method was conducted over them in the second stage.

Artificial Landscape
With this example, I attempt to illustrate how different definitions of the normalized factor ln (s) influence the value of the normalized Shannon's diversity index in full-coverage mapping and sample based approaches.Figure 1 shows two artificial landscapes with the same extent but having a different number of land cover types present within landscape.The normalized Shannon's diversity index of these landscapes was calculated based on both land cover types actually being present and on the number of land cover types within the classification system, which is a fixed number (in this example, seven land cover types).
Figure 1.Two landscapes with the same extent but a different number of land cover types; landscape A with two land cover types and landscape B with four land cover types

Real Landscapes with Point Sampling Method in Second Stage
The analysis was conducted on twenty 1 km 2 real landscape (sampling units) from the National Inventory of Landscape in Sweden (NILS, Ståhl et al., 2011).These sampling units were randomly selected and distributed across Sweden.The already delineated land cover maps (in GIS environment) were used for the calculation area proportion of different land cover types and thus the reference value of the index.In order to estimate the index value in second stage a systematic point sampling design was applied on each already delineated land cover maps (see Figure 2).The bias and variance of Shannon's diversity estimator was derived through a large number of independently replicated samples (1000 times) for each sample size (i.e., 49, 100, 225, and 400 points) and each real landscape.There was convergence in the mean of the estimates since samples were independently selected.Note that each systematic sampling was conducted with a random starting point.

Results
The value of the normalized Shannon's diversity index was calculated for the two artificial landscapes.The value of the index was the same (i.e., 1) for both landscapes when the number of land cover types present within landscape was used.But the index value was 0.356 (for landscape A) and 0.712 (for landscape B) when the number of landscape cover types in classification system (here seven land cover types) was applied.Relationship between reference and estimated values of the index for twenty 1 km 2 real landscapes with sample size 225 (points in point grid) is shown in Figure 3.There was a high and positive correlation between the reference and estimated values of the index and this was true for all four different sample sizes.
Detailed information on reference values (true values), bias and variance of the Shannon's diversity estimator for twenty real landscapes are provided in Table 1.As expected, the estimated variance of Shannon's diversity estimator tended to decrease with increasing sample size.In some cases, however, variances were not exactly inversely proportional to sample sizes.This can be explained by the landscape pattern and systemaic sampling design applied.There was also a very small and negative bias for the estimator.Ramezani et al. (2010) was already explain that this bias is due to non-linear definition of Shannon's diversity index.Similar to variance, bias of the estimator tended to decrease when sample size increased.

Discussion
This paper demonstrates the need for caution when calculating currently used landscape metrics in sample based approaches.A given landscape metric will be proper when differentiating landscapes with different patterns.Depending on whether landscape pattern analysis is conducted based on a full-coverage mapping approach or sample based approaches, an appropriate scaling factor (ln (s)) should be chosen.
In any stage of multi-stage sampling design some land cover types may be missed, as demonstrated by Hunsaker et al. (1994), particularly when using a detailed classification system (where a general land cover type may be divided into several more detailed sub-types).The results show that some metrics need slight modification to accommodate sampling data.Such modifications can lead to straightforward interpretation of metrics and also enable the differentiation of two different landscapes with different patterns.The results also show that with a moderate sample size a reasonable precision can be achieved.In a high fragmented landscape, where a large and continuous land cover type is broken into many small and isolated patches, a larger sample size is needed in order to achieve an acceptable precision.
In this study, whereas in the second stage point sampling was conducted on 1 km 2 already delineated land cover map, in practice such a sampling survey can be performed on non-delineated maps (raw remotely sensed data).
In such situation in addition to saving time of survey (cost), the associated polygon delineation errors can be eliminated.The potential of sampling survey in eliminating the errors has been demonstrated by Corona et al. (2004) and Ramezani and Holm (2011) where line intersect sampling (LIS) method has been applied for estimating Shannon's diversity and forest edge length on non-delineated aerial photo.See Figure 4 a systematic point sampling design with a random starting point over a raw aerial photo where polygons are not delineated.

Figure 2 .Figure 3 .
Figure 2.An example of 1 km 2 real landscape with a systematic point sampling design and a random starting point

Figure 4 .
Figure 4.An example of systematic distribution of point grid (point sampling) on raw aerial photos (non-delineated)

Table 1 .
The reference value of the normalized Shannon's diversity index, estimated bias and variance for the twenty 1 km 2 real landscapes based on the fixed number of land cover types in classification systems (i.e., 7), a Bias is deviation between estimator and the true parameter.