Accuracy and Conservatism of Vapour Intrusion Algorithms for Contaminated Land Management

This paper provides a view on the suitability of screening-level vapour intrusion (VI) algorithms for contaminated land management. It focuses on the accuracy and level of conservatism for a number of screening-level algorithms used for VI into buildings. The paper discusses the published evidence of the accuracy of VI algorithms and puts the conservatism is a broader regulatory perspective, including advice on how to deal with variation. In closing further research needs are provided.

algorithms, when predictions are contrasted with observations, to be compared towards accuracy (Provoost et al., 2008b). The lower the score for the criteria, the more accurate an algorithm predicts concentrations in close proximity to the observed concentrations.
The definition of conservatism varies depending on the context it is used in, and for VI algorithms it is defined as to what extent predictions of algorithms take into account variation as a result of variability and uncertainty (Eklund & Burrows, 2009;Johnston & MacDonald, 2010;McAlary et al., 2009;Labieniec et al., 1996), while maintaining a sufficient level of protection. This mostly includes the application of conservative assumption when applying a deterministic approach. For the probabilistic approach the use of the 95 percentile value of the probabilistic risk distribution is recommended by the EPA (2001) and Ferguson (1999) as a sufficient level of protection. The level of conservatism can be determined by comparing the deterministically predicted concentrations to the 95 percentile value of the probabilistic distribution or alternatively to the median observed concentration.

Accuracy and Conservatism of Screening Level Algorithms
The studies summarized below provide a selection of publications that deal with the accuracy and conservatism of VI algorithms while focusing on the comparison between algorithm predictions and observed concentrations. The algorithms discussed in these studies are listed in Table 1. At the end of this section conclusions are drawn from the publications discussed. For each of the algorithms, except GSI, indoor air concentrations were predicted for diffusive transport, and for the combined diffusive and convective transport. For both sites and all four algorithms the observed concentrations were below or in the lower range from the predicted indoor air concentrations and ranged over 1 OoM. The differences were explained by the transport flow the algorithms include (convective air flow), differences in parameter assumptions made, different ways of expressing the mathematical solutions for transport processes, limitations in obtaining values for input parameters.
A volatilization algorithm was presented by Turczynowicz and Robinson (2001) for the case of a worst case exposure for a crawl space dwelling, as this is typical for Australia. The algorithm predicts indoor concentrations by applying a 1D transport model where the source is infinite. Data from sites in Australian were used to address, dilution, ventilation and first-order soil and air degradation (Turczynowicz & Robinson, 2007). The sensitivity analysis revealed that mainly building parameters drive the VI predicted by the volatilization algorithm, and to a lesser extend the soil and physico-chemical properties (Turczynowicz & Robinson, 2001). Hers et al. (2002) reviewed a set of algorithm characteristics and their variation. Observations from four sites were compared with predictions from the GSI, Risc, SVIM, and VolaSoil algorithms. Results show that the VI process is sensitive to particular input parameters and processes like biodegradation or convection. The algorithms, incorporating besides diffusion also convection based on the Johnson and Ettinger model (JEM) (Johnson & Ettinger, 1991), predict indoor air concentrations for aromatic hydrocarbons up to 2 OoM higher than the observed concentrations. Caution is required when applying algorithms for highly convective soils. It was concluded that the uncertainty in parameters values and the algorithms itself contributes significant to variation.
In evaluating the Johnson and Ettinger model (JEM), Hers et al. (2003) presented a review of previously published data from several field sites with contamination of chlorinated compounds and BTEX (benzene, toluene, ethylbenzene, xylenes). Sources ranged from 0.5 m to 10.7 m below building foundations and included both groundwater and soil gas sources. For petroleum hydrocarbon sites, measured vapour attenuation factors ranged from ~10 -7 to 10 -5 . For chlorinated solvent sites, groundwater attenuation factors were in the order of 10 -5 to 10 -4 for the most reliable data sets. The authors concluded that, for almost all cases, the best estimate JEM-predicted attenuation factors were one to 2 OoM (Sanders & Hers, 2006) more conservative than the median measured values.
The VolaSoil algorithm from Waitz, Freijer, Kreule, and Swartjes (1996) was adapted and published in Bakker, Lijzen, and van Wijnen (2008) and prior underwent a validation against site data as describe in Wijnen and Lijzen (2006). The later revealed a correlation between the groundwater and air concentrations in soil, crawl space and house for (tetrachloroethylene) PCE. The accuracy of the algorithm for predicting the indoor air concentration was considered good as it generally predicts concentration within 1 OoM when compared to observed concentrations. The algorithm was less accuracy, over-predicted, for vinyl chloride and cis 12-dichloroethene (DCE), however more accurate for PCE, when comparing the predictions with the observations. For trichloroethylene (TCE) both over-and under-predictions were observed for the soil air (overall < 1 OoM, maximum 2 OoM) and indoor air (Sanders & Hers 2006). Provoost et al. (2008b) investigated the accuracy of seven VI algorithms for groundwater contaminated with VOC. All algorithms had a tendency to predict higher concentration then observed, maximum 3 OoM. The JEM and Vlier-Humaan algorithms were most accurate in predicting the soil air concentration, while the DF algorithms from Sweden and Norway were least accurate. The algorithms most accurate for indoor air, by applying the three criteria as outlined in Loague and Green (1991), were JEM, Vlier-Humaan and CSoil. The DF algorithm from Norway frequently over-predicts indoor air concentrations, while the DF from Sweden both over-and under-predicts, and was considered less accurate. The algorithms JEM, Vlier-Humaan and CSoil were considered to be accurate, while maintaining a sufficiently level of conservatism, and therefore suitable for human health risk assessment and derivation of SSV within contaminated land management.
The objective of Provoost et al. (2009) was to compare predictions and observations from seven selected VI algorithms as a result of the presents of VOC in the vadose zone. Accuracy was determined by applying the criteria from Loague and Green (1991). Algorithms over-predicted the soil air more (< 3 OoM) than indoor air. Accurate algorithms for predicting the soil air concentration were JEM, Vlier-Humaan and VolaSoil, and for the indoor air concentration JEM and Vlier-Humaan. All algorithms had a tendency to over-predict frequently the indoor air concentration, with the exception of JEM and Vlier-Humaan that also frequently under-predicted. Algorithms that are accurate but still conservative were CSoil, VolaSoil and Risc, and are suitable for SSV derivation and site specific risk assessment.
Johnston and MacDonald (2010) applied a probabilistic approach to the JEM for the TCE and PCE contaminants in the groundwater plume originating from the former Kelly Air Force Base in San Antonio Texas. The indoor air concentrations were predicted for just over 30,000 houses and for a small number indoor air concentrations were measured. The JEM was applied by using the parameter set as proposed by the US EPA and by applying an alternative parameter set. Predicted mean PCE concentrations show that in just under 6% of the houses the screening value is exceeded and in around 85% of houses the 95 percentile concentration exceeds the screening value. For PCE almost 50% of the mean predicted concentrations exceed the screening value, and in 99% of houses the 95 percentile value exceeds the screening value. The alternative parameter set produces slightly higher indoor air concentrations. When observations and the 95 percentile predicted concentrations were compared they were in a closer proximity than the 50 percentile predicted concentrations, which underestimated the indoor air concentrations.
The purpose of Provoost et al. (2013) was to present a probabilistic assessment with sensitivity analysis using www.ccsenet.org/ep Environment and Pollution Vol. 2, No. 2; Monte Carlo simulation for the selected VI algorithms used in various regulatory frameworks for contaminated land management. The Risc algorithm was excluded as in its available form did not allow the application of a probabilistic approach. For each of the algorithms a deterministic approach with the default parameters set was evaluated against results from the probabilistic assessment. To determine their suitability for regulatory purposes algorithms were ranked according to accuracy and conservatism. The conservatism relates to the prevalence of false-negative errors for predicted indoor air concentrations. The findings from this study suggest that the screening-level algorithms that have a degree of conservatism (less false negative predictions) are the JEM, DF from Norway, VolaSoil and Vlier-Humaan. From these 4 algorithms the JEM and VolaSoil have as well a high accuracy (discriminative power). Different parameters, that can be variable or uncertain, contribute to the variation in indoor air concentration and differences were observed between algorithms and aromatic or chlorinated hydrocarbons.
The aforementioned publications reveal that accuracy and conservatism in predicting the indoor air concentration differ depending on the site investigated or value selected for the input parameters. In these studies differences regarding conservatism and accuracy between the VI algorithms can be observed. For example Provoost et al. (2008bProvoost et al. ( , 2009 show that some of the screening level algorithms (VolaSoil, CSoil, Risc) tend to strike a better balance between accuracy and conservatism than others (JEM, Vlier-Humaan, VolaSoil), while some (DF Sweden and Norway) seem too conservative, thus increasing the probability of producing false negatives. The result from Provoost et al. (2013) suggest that the JEM, Vlier-Humaan and VolaSoil are accurate and conservative for predicting the indoor air concentration, therefore promoting the same algorithms as Provoost et al. (2008bProvoost et al. ( , 2009. Keeping the right balance between accuracy and conservatism will determine the suitability of screening-level algorithms for health risk assessments at contaminated sites. To promote uniformity in contaminated land management it is recommended in Provoost et al. (2008a) as well as by Swartjes (2007) and Provoost, Cornelis, and Swartjes (2006) to "construct a toolbox for the prediction of human health risk exposure that includes different algorithms for which fixed and flexible input parameters are made available. Fixed algorithm parameters are standardized and applied uniformly, for example physico-chemical parameters, while flexible input parameters permits the user to include regional or country specific parameter values and/or policy decisions, for example geographical, ethnological and cultural differences or variation in soil and building properties".
Fisher, Ireland, Boyland, and Critten (2002) have proposed the use of more than one algorithm for encompassing the algorithm uncertainty and improve best practice. Findings from literature support the consideration to use more than one algorithm as it provides a quality control, explains differences, improve credibility of the algorithm and better deal with variation in input parameters and resulting ranges of air concentrations (Provoost et al., 2013).

Application of Screening-Level Algorithms in Deriving SSV
Screening-level algorithms need to predict close to reality (need to have high accuracy) so credible concentrations are calculated to underpin site assessment and remediation decisions (Robinson & Turczynowicz, 2005). On the other hand the derivation of SSV requires a sufficient level of conservatism to account for variation in sites, soil, buildings, residents, etc. Some publications propose the use of the 95 percentile probabilistically derived value. A probabilistic approach provides a distribution of possible concentrations and is therefore more representative of the actual outcomes. If applied to site-specific risk assessments a probabilistic risk assessment provides more representative ranges as more site conditions are taken into account (ITRC, 2008). As pointed out before, the US EPA recommends the use of the 95 percentile value of the risk distribution as the default percentile for risk management decision making (EPA, 2001;Ferguson, 1999). Provoost et al. (2008bProvoost et al. ( , 2009) suggest that the current generation of screening-level algorithms are sufficiently conservative, but could as a result lead to (unnecessarily) high indoor air concentrations and thus to (unnecessarily) low SSV. Frequently, as pointed out in Provoost et al. (2008bProvoost et al. ( , 2009, observed concentrations are lower than predicted concentrations. Provoost et al. (2008a) report that the range between the SSV from various countries was 1 to 4 OoM depending on the VOC assessed and algorithm used. The algorithm's parameters were subdivided in scientific (for example solubility, vapour pressure), political (for example toxicological health limits) or geographical (for example dwelling or soil properties) elements. Two algorithms (JEM, DF algorithm from Sweden) were used to demonstrate the effect of increasingly harmonizing both the political and scientific parameters, whereas the geographical parameters varied. SSV were derived with and without parameter harmonization. "Results show that the algorithms plus other scientific and political parameters are suited for harmonization. The variation www.ccsenet.org/ep Environment and Pollution Vol. 2, No. 2; decreases to below 1 OoM, after scientific and political parameters were harmonized". The reduction in the variation of SSV by harmonizing geographical parameters was limited when compared to scientific and political parameters. The study suggests that algorithms and their parameters can potentially be harmonized.
A further relevant factor in the context of deriving SSVs is the use of Henry's law in screening-level algorithms. Provoost et al. (2011) presents a study of Henry's equilibrium partitioning between (ground) water and (soil) air which compared the observed with the predicted soil air concentration. VI algorithms typically include phase partitioning calculations of VOC by applying the Henry's law to predicting the soil air concentration of a particular contaminant. A series of column experiments were conducted with various toluene concentrations in artificial (ground) water to contrast the predicted and observed (soil) air concentrations. The experiments in which soil material was excluded (only water) shows toluene fugacity behavior roughly in line with Henry's law, whereas the experiments that included soil material result in equilibrium soil concentrations which were around 1 OoM lower than was expected from a Henry Law-based estimation. It is concluded that inclusion of Henry's Law in VI algorithms does not provide an adequate description of toluene volatilization in soils and may lead to an overestimation of health risk. Instead, a model based on a simple description of the relevant intermolecular interactions could be explored to improve current screening-level algorithms for VI (Goss, Buschman, & Schwarzenbach, 2001;Goss, 1994).

Place of Screening-Level Algorithms in a Tiered Approach
The comparison of predictions to observed measured site data has contributed to an improved tiered approach for sites where the use of the algorithms may not be appropriate (Provoost et al., 2009). This may benefit practitioners conducting human health risk assessments at contaminated sites (Hers, 2004). The tiered approach proposed by Provoost et al. (2009) may assist regulators to better understand the general link between the conceptual site model (CSM) and variation as a result of temporal and spatial uncertainty and variability in parameter values. This approach can in addition assist in explaining results that are subjected to variation. Provoost et al. (2009) provides a tiered approach from a generic conservative to specific assessment that links the assessment to data collected on site (Johnson, Luo, Dahlen, & Holton, 2011). Nevertheless, further development of a tiered approach seems to be problematic. Regulators may want to avoid false negatives predictions where the screening-level algorithm needs to poses a high conservatism while accuracy is of less importance. On the other hand, parties responsible for the contamination may require predictions that are highly accurate (a minimum of false positive or negative predictions) with a minimum of conservatism (McAlary & Johnson, 2009). Also parties responsible for the contamination tend to prefer exposure that is just below the regulatory tolerable concentration, while the impacted parties, like residents, tend to prefer the lowest possible concentration (Siegel, 2005).

Suggestions for Further Research
Screening-level algorithms need to be further field validated against a more extensive series of sites and contaminants. Some of the field validation has been undertaken for the JEM by Weaver and Tillman (2005), for four screening-level algorithms by Crump, Hartless, Ross, Scivyer, Davidson, and Pout (2005) and in Provoost et al. (2008bProvoost et al. ( , 2009Provoost et al. ( , 2013. It would be helpful to extend the field validation to well documented sites from different countries similar to what Turczynowicz and Robinson (2007) did for Australia.
Few well documented sites, with spatial and temporal data in a variety of media (soil, air, house), are available in the public domain to verify algorithms. For most of the sites input parameters for the algorithms are not measured, and furthermore measured concentrations, in for example the soil, basement and/or indoor air, are active air measurements which provide point estimates. Passive measurements could provide a more time weighted average concentration (Hodny, Whetzel, & Anderson, 2009). In addition, the limited data available show great variability. It would be useful to extend the number of well documented sites.
As was concluded in Provoost et al. (2013) that a probabilistic risk assessment should be applied to more different sites (soil types and contaminants). Conservatism for at least the hydrocarbons needs to be further investigated, especially for chlorinated hydrocarbons. Additional investigations as to the accuracy of algorithms could further clarify which algorithms are accurate for site specific risk assessments and which algorithms are more conservative and therefore serve well for the derivation of SSV.
A better understanding and conceptualization of temporal changes in the soil and building is needed. Provoost et al. (2009) and Hers et al. (2002) provide a tiered approach for linking of the VI pathway assessment with questions, data collection, and data analysis/predictions. However this tiered approach needs to be further investigated, applied, and refined.  Provoost et al. (2011) have stated that the application of the Henry constant for the calculation of the soil air concentration, from groundwater pollution, needs to be further looked into. The study states that for toluene the use of the Henry's Law, as currently included in VI algorithms, does not provide an adequate description of volatilisation in soil and may lead to an overestimation of health risk. More chemicals need to be included in the verification of the application of the Henry constant, with the preference for chlorinated VOC, and in addition more aromatic VOC. If the findings are confirmed for other chemicals, a model based on a simple description of the relevant intermolecular interactions could be explored and perhaps introduced in the current screening-level algorithms for VI.
Several publications like for example Johnson, Hertz, and Beyers (1990), McHugh et al. (2010), Provoost et al. (2010), DeHate, DeHate, Johnson, and Harbison (2011), Verginelli andBaciocchi (2011), McAlary, Provoost, Dawson, and, have hinted that biodegradation can play an important role under certain circumstances. Further research is needed to clarify when biodegradation in soil (and indoor) air needs to be accounted for. Possibly a set of indicators can be developed that specifies when biodegradation occurs with as a result a substantial reduction of VI.