Spatial Distribution of Calibrated WOFOST Parameters and Their Influence on the Performances of a Regional Yield Forecasting System

We investigate in this study (i) a redefinition of crop variety zonations at a spatial scale of 10x10 km, and (ii) the influence of recalibrated crop parameters on regional yield forecasting of winter wheat and grain maize in western Europe. The baseline zonation and initial crop parameter set was derived from the operational European crop growth monitoring system (CGMS) which involves the agrometeorological model WOFOST. Air temperature data from 325 weather stations over the 1992-2007 period were used to define new zonations in a 300 x 300 km test site. Two parameters which influenced mostly the phenological development stages (i.e. TSUM1 and TSUM2, the effective air temperature sums from emergence to anthesis, and from anthesis to maturity, respectively) were chosen and calibrated. The CGMS was finally run based on these new recalibrated parameters and simulated crop status indicators were compared with official statistics over the 2000-2007 period. Our results showed that the days of anthesis and maturity were simulated with coefficients of determination (R) ranging from 0.22 to 0.87 for both crops over the study site. A qualitative assessment of maximum leaf area index and harvest index also revealed a more consistent spatial pattern than the initial zonation in the simulation results. Finally, recalibrated TSUM1 and TSUM2 led to improved relationships between official yield and simulated crop indicators (significant R in 17 out of 28 and in 14 out of 59 NUTS3 regions with respect to the best predictor for grain maize and winter wheat, respectively).


Introduction
Timely and accurate information on crop yield and production have led to the development of several forecasting systems applied at various temporal and spatial scales.In Europe, the MARS crop yield forecasting system (MCYFS, van Diepen, 1992;Vossen & Rijks, 1995;Genovese, 1998;Boogaard et al., 2002) is operationally used for the major crops and pasture monitoring at European administrative levels (NUTS level 2, 1, and 0; Genovese & Bettio, 2004).The MCYFS is based on the simulation of the crop growth monitoring system (CGMS), and for crops it allows regional application of the WOFOST (World Food Study, van Diepen, Wolf, & van Keulen, 1989;van Ittersum et al., 2003) model by providing a framework which handles model inputs (weather, soil, and crop parameters), model outputs (namely crop indicators such as total biomass, grain yield and leaf area index), aggregation to statistical regions, and yield forecasting at these different administrative levels.The performance of such crop growth model in yield forecasting depends both on the model's ability to reproduce the effects of environmental conditions and crop management practices, and on a proper aggregation of simulation results for individual land units towards higher aggregation levels.The spatial scale of input data is a major concern in such systems as the model's response is often non-linear.Spatial aggregation of inputs prior running the model may therefore give different results than aggregating model outputs (Hansen & Jones, 2000;Mearns, Easterling, Hays, & Marx, 2001).Since scale and uncertainty of meteorological forcing have a large impact on model simulations through non-linear interactions and relatively static factors in time (i.e., soil, crop, management practices) contribute less to the in-season crop yield forecast than dynamic factors (i.e.weather), most efforts in spatial scale-related research for regional crop growth modelling have been dedicated to defining the optimal scale of the meteorological forcing used to drive the model (Easterling, Weiss, Hays, & Mearns, 1998;Challinor, Slingo, Wheeler, Craufurd, & Grimes, 2003;de Wit, Boogaard, & Diepen, 2005).This is especially true when the model outputs are adjusted to historical official statistics at individual administrative units using multiple linear regressions which is a common approach in current operational forecasting tools (Supit, 1997).The acceptable yield forecasts might mask the inaccuracy of some static factors, mainly due to their regional distribution.This spatial distribution is therefore an important boundary condition for successful crop yield prediction, particularly in coupling crop growth models with Earth observation satellite data (Dorigo et al., 2007).
Regional calibration of a crop growth model is defined as the calibration of model parameters to match the observed response at this regional scale (Hansen & Jones, 2000).When the individual simulation units of the modelling domain become smaller, the calibration approach should reflect the local conditions.Therefore, calibration using regional statistics is no longer appropriate because these do not represent local heterogeneity and lump many factors together.More attention is consequently paid to the spatial distribution of crop parameter values in order to represent local effects.Different procedures with varying assumptions and complexity exist and have led to quite acceptable approaches for regional calibration of crop model parameters.For example, an automated calibration platform (i.e.CALPLAT) has been developed and used for calibrating WOFOST parameters at the European NUTS2 level (Wolf et al., 2011).Moreover, with the advances in technology and data processing, a large amount of satellite remote sensing (RS) data is available.Owing to their synoptic, timely and repetitive coverage, RS have been recognized as a valuable source of information for regional crop growth modelling.Accordingly, model performances can be significantly improved through RS data assimilation (Moulin, Bondeau, & Delecolle, 1998;de Wit & van Diepen, 2007;de Wit, Duveiller, & Defourny, 2012).
Although yield forecasts at an intermediate scale (i.e., administrative NUTS3 level of the European Union) are essential for decision makers at national and sub-national level as well as for public institutions and private companies, calibration procedures at this scale is still lacking.This paper aimed at (i) calibrating two major WOFOST parameters at the European NUTS3 administrative level, and (ii) assessing the effect of the spatial distribution of these new calibrated parameters on the performance of the European operational CGMS in regional yield forecasting at this scale.The effect of improved zonations and crop parameter values on crop yield simulation for winter wheat and grain maize over a test site of 300x300 km in western Europe was assessed.Our approach for aggregating the crop growth model parameters at regional scale was a local application of the agro-ecological zoning strategy (Fischer & van Velthuizen, 1999).The simulation results of the default and recalibrated based CGMS were investigated and its performance was benchmarked in terms of correlation with reported crop yields from the national statistics offices at the NUTS3 level.This paper served as a preliminary study in the assessment of WOFOST performances at European NUTS3 level and was based on data retrieved from field experiments.The use of RS data for such analysis is discussed as part of future work.

Study Area and Data
The study area was a square of 300x300 km including Belgium, northern France, Luxembourg and the bordering areas of Germany and the Netherlands.This test site involves a climatic gradient that needs to be taken into account in the regionalization of the crop model.Although different countries were included in the test site, the results at NUTS3 level were analyzed only for the Belgian and northern France regions.

Crop Simulation Model, Soil and Weather Inputs Data
WOFOST is the core of the CGMS.It is a mechanistic crop growth model that describes plant growth by using light interception and CO 2 assimilation as growth driving processes and by using crop phenological development as growth controlling process (van Diepen et al., 1989;van Ittersum et al., 2003).The model can be run in different modes: (i) a potential mode, where crop growth is purely driven by temperature and solar radiation and no growth limiting factors are taken into account; and (ii) a water-limited mode, where crop growth is limited by the availability of water.The difference in yield between the potential and water-limited mode can be interpreted as the effect of drought.Currently, no other yield-limiting factors (nutrients, pests, weeds) are taken into account.
Soil inputs used in this study were derived from the 1:1 000 000 European soil map (Challinor et al., 2003).The 10x10 km grid was overlayed with the soil data in order to determine which soils were available in each grid.
Soil moisture contents at different pressure heads (saturation, field capacity, wilting point) were estimated from the soil physical description.This information was used for the water balance calculations needed in the water-limited mode.Besides estimating soil hydraulic properties, the soil map is also used for estimating the suitability of soils for different crops.
Daily meteorological information (minimum and maximum air temperature, wind speed, vapour pressure and precipitation) was derived from 325 weather stations over the 1992-2007 period.Solar radiation was only available from a limited number of stations.It was therefore estimated at station level using relationships involving daily temperatures, cloud duration or sunshine hours (Supit & van Kappel, 1998).
Density of weather stations was considerably higher in Belgium than in northern France.Weather variables were interpolated on a 10x10 km grid cell using the methodology applied in CGMS (Supit, 2000).Each cell received values for air temperature, radiation, vapour pressure, wind speed and calculated evapotranspiration, as an average from suitable surrounding weather stations.Determination of the most suitable weather stations was based on the so-called "meteorological distance" (van der Voet, van Diepen, & Oude Voshaar, 1994).In case of rainfall a grid cell received the value of the weather station with the smallest meteorological distance from the grid cell.This method was chosen in order to avoid the distortion of precipitation sequences caused by averaging precipitation values from multiple weather stations.

Observations From Crop Experiments
Observations of day of planting (DOP), day of emergence (DOE), day of anthesis (DOA), day of maturity (DOM), and day of harvest (DOH), all expressed as day of the year, from winter wheat and grain maize experiments were used for crop parameter calibration.The observations were gathered from variety trials of seed companies or agricultural research institutes in Belgium and France.The final database consisted of 66 experiments for winter wheat and 140 for grain maize.Incomplete experiments were retained and used for deriving the distribution of sowing, maturity and/or harvest dates.All experiments took place between 2000 and 2007 and involved different genotypes.Figure 1 shows the regional distribution of the retained experiments for both crops.Useful experiments for winter wheat were only available in northern France and Luxembourg.The grain maize experiments were more evenly distributed over the study area.Note that the total number of points in Figure 1 is much smaller than the total number of experiments since many experiments took place at the same location.
Figure 1.Overview of the study area with location of weather stations and crop experiments Crop experiments that were used for calibration are marked with red.Thick and thin lines refer to country and NUTS3 boundaries, respectively.

Hierarchical Approach for Regional Calibration
In order to recalibrate the system, model parameters related with phenological development were first calibrated in order to 'synchronize' the simulated development stages with observation data.Then parameters related with potential production levels (e.g.assimilation rate) were considered, and finally the parameters related to water-limited production levels (e.g.critical soil moisture level).For this latter step the spatial distribution of crop varieties was refined during the calibration as phenological responses should be more spatially homogeneous compared with water stress responses.

Defining the Spatial Distribution of Crop Varieties
The spatial classification was based on a 10x10 km grid.Given the limited availability of experimental crop data (crop phenological stages), only major crop parameters that influence the phenological development stages were explored, i.e.TSUM1 (effective temperature sum from emergence to anthesis) and TSUM2 (effective temperature sum from anthesis to maturity).The effective temperature is a modified daily average temperature which takes into account temperatures threshold and the non-linearity of temperature in the development rate (van Diepen et al., 1989).The spatial distribution of crop varieties was mostly determined by climatic gradients (North-South increases in temperature sums) as well as by topographic features (the Ardennes massif with higher altitude and lower temperature sums).Length of growing period was strongly correlated with the sum of air temperatures.Therefore, thresholds calculated for each climatic grid cell and relating to the long term daily mean temperatures were used to define zones of equal variety.Rainfall distribution or meteo-derived indices such as water balance or length of the growing period (widely used in the FAO agroecological zoning methodology; Fischer & van Velthuizen, 1999) might be used as explanatory variables in the zonation process.However, rainfall was found not to be a strong limiting factor with respect to yield.For winter wheat the length of growing period (expressed in °C.day,base temperature equals to 0 °C) was calculated from 1 st of January to 30 th of June for each study year (TSUM range 795-1516 °C.day).Preliminary analyses showed that most of the parameter variability occurred in the vegetative stage of crop development.The aggregation period therefore corresponded to this stage over our modelling domain.For grain maize, the length of growing period was calculated from 1 st of April to 1 st of October for each study year with a base temperature of 6 °C (TSUM range 1208-2071 °C.day).The difference in aggregation periods also took into account the difference in planting and harvesting dates of the two crops.
The following rules were used as guidelines to define the number of zones and the temperature sum thresholds: (i) the spatial patterns of the resulting variety zonation should reflect known characteristics of the modelling domain; (ii) the variety zonation should consist of contiguous zones as much as possible.Patchy displays of zones should be avoided.

Parameter Calibration Approach
The calibration approach over all newly defined variety zones aimed at minimizing the root mean squared error (RMSE) between observed and predicted dates of phenological stages.To calibrate TSUM1 all crop experiments within a variety zone with available observed DOP and/or DOE, and DOA were used.When the crop experiment only contained an observed DOP, the DOE is estimated based on this DOP.For the calibration of TSUM2 all crop experiments with available observed DOA and DOM and/or DOH were used.Again, when the crop experiment only contained DOH, the DOM is estimated by subtracting 7 days from the observed DOH.Although the DOH depends highly on management practices and favourable weather conditions, we assumed that a farmer left his crop on the field for on average one week after physiological maturity.
In order to determine the influence of the new variety zones and crop parameters, we used the variety zones and crop parameters inherited from the operational European CGMS (referred to as 'classic') and the new variety zones and recalibrated crop parameter values (referred to as 'recalibrated').

Reported Official Yield and Acreage Data
Belgian official yield statistics have been provided by the National Statistics Institute (http://www.statbel.fgov.be)and completed through the Belgian Crop Growth Monitoring (B-CGMS) website.They spanned the 1998-2007 period.Although the total number of NUTS3 regions was 43, only 31 were used due to missing data.Agricultural land use and acreage data have been derived from the geographical information system layers of the Integrated Administration and Control System (IACS) over the 1998-2007 period.
French crop yield and acreage statistics at NUTS3 level were downloaded from the AGRESTE website (http://agreste.agriculture.gouv.fr/).Cultivated area, yield and harvested products for winter wheat and grain maize were available for 28 NUTS3 regions over the 1998-2007 period.All yield and acreage data were generated by the France Statistical and Economy Regional Services.
No detrending procedures were applied to the reported crop yield statistics due to the shortness of the time-series (Belgium) and the lack of trend in the reported crop yields.The statistical data at various administrative levels were used to benchmark the system.

Spatial Aggregation of Simulation Results to NUTS3 Level
The comparison between the simulated crop biomass values and official statistic data implies the aggregation of those simulated values to the administrative region scale.Since information on the cultivated area of each crop was lacking, the cultivated area within each grid cell was considered to be the arable land area derived from the CORINE land cover database 2000 (Büttner, Feranec, & Jaffrain, 2002) and assumed to be constant throughout the study period.Aggregation of simulation results to the NUTS3 level was thus performed by weighting on the area of arable land within each CGMS grid cell within a NUTS3 region.

Assessment of Simulation Results and Benchmarking
Errors between the simulated and observed DOA and DOM were statistically and graphically analysed and a qualitative assessment of the average harvest index (HI) and maximum leaf area index (LAI) under potential conditions over the 1992-2007 period was made.Internal consistency checks on the simulation results for both experiments were achieved in order to evaluate whether the simulated crop biophysical variables are within a plausible range and if the spatial patterns of these variables are consistent, and to derive important information about model behaviour.
A benchmarking at European NUTS3 level was finally achieved throughout the 300x300 km test site over the 1998-2007 period (guided by the availability of official yield data).This involved an assessment of the significant and positive impact of newly calibrated TSUM1 and TSUM2 on wheat and grain maize yields simulated by CGMS.First, summary statistics were derived for potential biomass (PB), potential grain yield (PG), water-limited crop biomass (WB), and water-limited grain yield (WG), then the correlation matrix between the classic and recalibrated indicators was derived, then the relationship between official yield statistics and each crop indicator was assessed through a simple linear regression (a trend analysis was performed prior to this assessment), and last a leave-one year-out cross-validation (Jansen, 1995) was used to test the performances of the two sets of models derived from classic and recalibrated indicators.For each NUTS3 times series with N years, N-1 data points are used to build a linear regression model and testing the results against the remaining single data point, in N systematic replicates, with k th point being dropped in the k th replicate (in this case k = N) (Cassell, 2007).
Model performance was assessed using the RMSE, the mean absolute error (MAE), and the relative root mean square error (RRMSE).The RMSE is the standard deviation of the difference between the regressor and the value predicted by the model, while MAE is the average of their absolute values.The RRMSE is the expression of the absolute difference as percentage of the statistical yield.It characterizes the quality of the predictor (Supit, 1997).The assessment has been made with all samples without considering the differences in length of time series between France and Belgium (mainly for winter wheat).

Zonations at the 10x10 km Scale Over the Test Site
Equal sum of temperatures intervals were used to stratify the test site.In the case of winter wheat, two zones were derived in the new crop variety distribution (Figure 2A).The first (Zone 1) included the Ardennes region in Belgium, Luxembourg and the eastern part of the French test site.The second zone (Zone 2) included the lower parts of Belgium and France and the Paris Basin.The boundary between the two zones followed approximately the 200 m altitude contour line in the North and the 300 m contour line in the South of the study site (Figure 2A).The inclusions of grid cells labelled as Zone 2 within Zone 1 in the South-eastern part could be related to the Mosel valley.In the inherited European CGMS, there were three variety zones within the test site (i.e.Zones 9, 10, and 11, Figure 2B).But the spatial distribution of the variety zones did not clearly follow the climatic gradient.For example, the variety zone in the coldest part of the Ardennes Massif (Zone 9) was also present in other parts of the study site (Figure 2B).
In the case of grain maize, three zones were found in the new crop variety distribution (Figure 3A).They were a relatively cool climate for the Ardennes region (Zone 6), a warm climate in the Paris basin and Mosel valley (Zone 8), and the remaining area with a moderate climate (Zone 7).The use of three zones was deemed necessary because of the larger temperature sum range over the growing period due to more pronounced climatic differences during the summer period.The variety zonation for grain maize inherited from the operational CGMS (Figure 3A) has two varieties (Zones 12 and 13) which roughly corresponded to a 'southern' and a 'northern' variety but the 'southern' variety can be found in the north of the test site as well (Figures 2B and 3B).

Winter Wheat
Table 1 lists the default and recalibrated TSUM1 and TSUM2 parameter values, as well as the summary statistics on the predicted DOA and DOM.The standard error of the TSUM1 and TSUM2 was generally lower for Zone 1 compared to Zone 2. Comparing the recalibrated TSUM1/2 values with the default ones, the recalibrated values for Zone 2 were only 3.46% larger than the default values for Zone 10 (total of 2170 °C.day over the growing season) which both covered the majority of the modelling domain (Figure 2).RMSE for simulated DOA was similar for the different zones (around 7.5 days, Table 1), while the correlation coefficient (R 2 ) was better for Zone 2 (0.87) than that of Zone 1 (0.47).RMSE for reproduction of DOM was lower for Zone 1 than that of Zone 2, although the R 2 for Zone 2 (0.47) was better than that of Zone 1 (R 2 equals to 0.22).Overall, the DOA and DOM were estimated with a RMSE equalled to 7.6 days and 6.8 days (R 2 of 0.81 and 0.45) respectively (Table 1).

Grain Maize
The standard errors for the TSUM1/2 calibration of grain maize were smaller than for winter wheat.They were ranged from 5.9 to 16 °C.day(Table 1).For example, the recalibrated TSUM1 and TSUM2 were 805 and 836 °C.dayrespectively, for Zone 7 (1641°C.dayover the crop growing season), and 782 and 1029 °C.dayrespectively, for Zone 8 (Table 1).The recalibrated values were higher than the defaults (5% to 13%).Importantly, the ratio between TSUM1 and TSUM2 was highly different, influencing therefore the leaf development.
RMSE for simulated DOA were 3.8 and 15.8 days (with corresponding R 2 of 0.60 and 0.48) for Zones 7 and 8, respectively.Whereas the RMSE obtained for simulated DOM were 4.7 and 12.1 days (corresponding R 2 of 0.78 and 0.55) for Zones 7 and 8, respectively (Table 1).
No observations were available for Zone 6 in order to calibrate TSUM1 and TSUM2.So for the crop simulation runs over Zone 6, the recalibrated TSUM1 and TSUM2 for Zone 7 were assigned to that zone.Given the relative milder climate for Zone 6, this implied that the simulations will not reach physiological maturity for some cropping seasons.However, this complies with known agricultural practices for the Ardennes region (B.Tychon, personal communication).In addition, the acreage of grain maize in this area was very small due to the unfavourable climatic conditions.
Figure 4 shows the comparison between simulated and observed DOA and DOM for winter wheat and grain maize.Overall, the R 2 in DOA case was higher than that of DOM.This can be explained by the fact that DOM was often not observed (DOH was then reported in this case).The system tries thereby to estimate DOM from DOH by assuming a fixed lag between maturity and harvest (7 days); while in practice the DOH varies and depends on actual weather conditions and logistic factors.

Winter wheat
The average maximum leaf area index (AvgMaxLAI) of winter wheat ranged from 5 to 6 for classic and recalibrated results (Figure 5).The spatial distribution of 'classic' simulation results (performed with default parameter values) showed areas along the northern boundary of the study site with high AvgMaxLAI (between 6 and 7), and other areas with relatively low AvgMaxLAI (between 3 and 4).These patterns in AvgMaxLAI were the result of the spatial distribution of TSUM1 and TSUM2 based on the 50 x 50 km grid cells of the operational CGMS.Likewise, the areas with deviating AvgMaxLAI also had deviating average HI (AvgHI): low values (0.38) in the north and high values (0.55-0.60) in the southern and central parts (Figure 5).In the case of 'recalibrated' simulation (performed with recalibrated parameter values), the AvgHI was slightly higher (0.48-0.51) than that of 'classic' simulations (0.47-0.48).
Overall, the AvgMaxLAI and AvgHI were in a plausible range for both classic and recalibrated simulations; but the artefacts used in classic simulation have been removed through the recalibration and the new spatial distribution of crop parameters.

Grain Maize
The spatial distribution of AvgMaxLAI for the 'classic' simulations was characterized by high values (ranging from 7.00 to 7.50) in the southern part and low values (between 5.70 and 6.20) in the northern part (Figure 6).In the 'recalibrated' simulation the AvgMaxLAI showed low spatial variability with values between 6.10 and 6.40 throughout the study area.The spatial patterns of AvgMaxLAI and AvgHI were more realistic for recalibrated results than for classic results.The South to North decrease in AvgHI for the recalibrated results reflects the increasing acreage of maize which is used in fodder production rather than in grain production.The 'recalibrated' simulation results generally showed a decreasing AvgHI from 0.49 in the Paris basin to 0.43 in northern Belgium, and 0.40 along the Belgian/French coast (Figure 6).In both 'classic' and 'recalibrated' simulations the Ardennes region has low AvgHI (0.26-0.30).Although the range of AvgMaxLAI is still plausible in 'classic' simulations, the existence of such systematic differences over the study area (0.30-0.39 in the southern part, and 0.41-0.45 in the northern part) is not realistic.
The high biomass values for 'recalibrated' simulation reflect high TSUM1/2 values (leading to a longer growing season).The same pattern can be found in PG.In addition, these results demonstrate that winter wheat yields were only marginally affected by drought as WB were only 1461 kg.ha -1 (recalibrated) and 1354 kg.ha -1 (classic).
The correlation matrix between 'classic' and 'recalibrated' results reveals that almost all indicators were highly correlated (Table 3).Correlation was highest between indicators of a same level of simulation (i.e.potential and water-limited levels) and was generally greater than or equal to 0.70.
The correlation matrix between 'classic' and 'recalibrated' simulations shows larger variability in the correlation between indicators (Table 3).First of all, water-limited and potential crop indicators were no longer correlated, demonstrating that drought stress during the growing season influences the temporal and spatial variability of crop indicators.PB was still highly correlated between 'classic' and 'recalibrated' simulated indicators (0.96, P < 0.01), while PG showed a lower R 2 (0.70, Table 3).The recalibration of TSUM1/2 influenced the temporal variability of the simulation results.However, in water-limited simulation level the drought stress signal was very marked (comparatively to the recalibration effect).WB indicators for 'classic' and 'recalibrated' simulations were almost similar (R 2 = 0.99).WG indicators showed a R 2 of 0.94 (Table 3).

Winter Wheat
Table 4 shows the R 2 for significant linear regression (at α = 0.05) between crop model indicators and official yield statistics for winter wheat over 1998-2007 period.For the 59 NUTS3 regions of the study area, in 13 regions ('classic' simulation) and 14 regions ('recalibrated' simulation) the regression analysis was significant.
By considering each parameter, PG accounted for 48%, PB for 20%, WG for 16% and WB for 16% of valid observations.PG has the best relationship with reported crop yields for a large number of NUTS3 regions for both 'recalibrated' and 'classic' simulations.Figure 7 shows the spatial distribution of R 2 for PG indicators.
Results for both 'classic' and 'recalibrated' had a similar trend characterized by relatively high R 2 in the southern NUTS3 regions and low R 2 in the northern part of France and Belgium.In addition, the regions with significant regression results were located in the southern part of the test site.From a qualitative point of view, the map of 'recalibrated' PG shows slightly high R 2 in northern France, compared with the map of 'classic' PG (Figure 7).
A covariance analysis (not shown here) on slope and intercepts of 'recalibrated' and 'classic' regression lines confirms these results.The two regression lines (official versus simulated yields) were significantly different when regressors were PB and PG indicators.Slopes of PB and PG were higher for 'recalibrated' than for 'classic', while the intercept values were not significantly different.The 'recalibrated' results fitted the reported yield statistics better than the 'classic' results for potential indicators (PG and PB).For water-limited conditions (WG and WB), there was no evidence to conclude a difference of slope and intercept between the 'classic' and 'recalibrated' CGMS.The cross-validation for winter wheat (Table 6) resulted in different estimates of generalized errors for each NUTS3 regions with significant correlation.RMSE, MAE and RRMSE of CGMS runs with default or recalibrated TSUM1 and TSUM2 were very similar.Results showed that there was no significant difference in performance between both model runs (P > 0.05).Average RMSE for the study area was 559 and 553 kg.ha -1 for 'classic' and 'recalibrated' simulations, respectively (case of regression with the best PG indicator).The corresponding RRMSE was equal to 6.4%.

Grain Maize
Due to limited official crop yield data for Belgium, only results for France were reported (Table 5).The PB and PG indicators had no significant correlation with reported crop yield and were therefore not provided.The regression analysis was significant for 16 and 17 out of 28 NUTS3 regions in the 'classic' and 'recalibrated' simulations, respectively.WB was the best indicator for final crop yield both for the 'classic' and 'recalibrated' simulations, but the differences were not significant.Correlation was significant for 10 and 15 regions for the 'classic' and 'recalibrated' model runs, respectively.The R 2 was also higher in 12 out of 15 cases.The spatial distribution of R 2 for the indicator WG confirms that the 'recalibrated' WG has more regions with significant correlation than the 'classic' WG (Figure 8).
The covariance analysis did not detect any significant difference on slope and intercept of regression lines between the 'classic' and 'recalibrated' simulations results and official data at NUTS3 level.The cross-validation analysis showed the best results when applied to the regression lines using the WB as regressor (Table 6).RMSE values of the 'classic' and 'recalibrated' CGMS results were almost similar (838 and 831 kg.ha -1 , respectively) and did not differ statistically from each other.Same conclusions were found for the MAE.The RRMSE was higher than that of winter wheat but remained acceptable (< 10%).

Discussion
Calibration of a spatially distributed crop model involves both the estimation of model parameters values and their spatial distribution.Our approach in this study was to define zones of equal variety prior to the calibration of different crop model parameters (through minimizing the error on all available observations within each zone).
The baseline zonation was derived from the operational European CGMS.The definition of the variety zones was driven by expert knowledge.Various aspects are taken into account such as administrative boundaries, climate gradients or other spatial features (i.e.continentality) depending on the location, resolution and size of the modeling domain, and on the data availability and calibration hierarchy.Although the actual boundaries of the zones might be arbitrary, the exact location is often not important since they are based on gradients thresholds (in our case air temperature sums).Moreover, the number of zones is determined by the availability of data for model calibration which can dictate the number and location of variety zones.
Our study focused on two relevant parameters of crop phenology in WOFOST, i.e., TSUM1 and TSUM2 over a test site of 300x300 km in Northern Europe.The improvement relies on the values of these two parameters as well as their spatial distribution.The calibration of TSUM1 and TSUM2 demonstrated that the observed DOA and DOM could be achieved quite satisfactorily with R 2 ranging from 0.22 to 0.87 for winter wheat, and from 0.48 to 0.78 for grain maize (Table 1).The recalibrated TSUM1 and TSUM2 for winter wheat were fairly close to the default values used in the operational CGMS for the majority of the modeling domain.For grain maize differences were found between the recalibrated and the default TSUM1 and TSUM2 both in value as in ratio.
A qualitative assessment of the simulation results was carried out in order to check their biophysical and spatial consistency.This assessment revealed that the spatial distribution of AvgMaxLAI and AvgHI was more realistic for simulations with recalibrated TSUM1 and TSUM2 parameters for both crops.For winter wheat the overall patterns were quite similar but some artefacts were removed due to the improved variety zonation.For grain maize large differences in the spatial distribution of AvgMaxLAI and AvgHI were removed by the recalibrated TSUM1 and TSUM2.The total recalibrated TSUM was higher for both winter wheat and grain maize which provided 6 to 10 supplementary days for biomass development.As results, average PB and PG increased.
Linear regression analyses were used to quantify the impact of the recalibration on the relationship between reported and simulated yields.For winter wheat, a significant correlation was obtained in 13 ('classic') and 14 regions ('recalibrated') out of 59 NUTS3 regions.In 76% of the regions it was not possible to establish a significant link between one of the four CGMS yield indicators and the official reported yields.Simulated PG was fairly correlated with observed yields in the southern part of the study area (R 2 varying between 0.3 and 0.5).
It decreased towards the north so that almost no significant correlation was observed in Northern-France and Belgium.No significant differences were observed between the 'classic' and 'recalibrated' results.The good performance of PG as predictor could be relied on a combination of several factors: (1) winter wheat is a fairly drought tolerant crop and is consequently grown early enough to avoid drought stresses throughout the cropping season; (2) the overall high production levels causing other factors (pests, management) to play a role in yield variability but are not included in the model; and (3) uncertainties in CGMS water balance calculations due to uncertainties in rainfall and soil physical properties over the study site (van Diepen, 1992;Hansen & Jones, 2000).Moreover, the weak relationship between CGMS water-limited simulation results and official data in some areas has been underlined by other authors (Vossen, 1992;Supit, 1997).For grain maize, the study was limited to the French NUTS3 regions.In general, the performance of the system was better than for winter wheat, with 17 out of 28 regions with significant correlation.Simulated WB was relatively well correlated with observed yields (R 2 ranging from 0.3 to 0.6).Furthermore, drought is the only limiting factor in CGMS.Grain maize is more drought sensitive and grows in a drought prone season.The system gets usually interesting performances in areas where yields are limited by drought.This fact could also explain the spatial distribution of NUTS3 with high R² mainly in the southern part of the study site and NUTS3 with low R² in the northern-west part (maritime climate regions).
The satisfactory fitting observed during the calibration phase for winter wheat was not confirmed by the validation step.Even though TSUM1 and TSUM2 were adjusted and looked spatially more consistent after recalibration for both crops, the overall CGMS performance was slightly improved for several reasons.First, the spatial distribution of observed phenological data is uneven (particularly for winter wheat), although this is probably of minor influence given the limited temperature gradient over the study area.Then factors such as pests and diseases are not included within CGMS and may be responsible for a substantial part of the inter-annual variability of yield in the different NUTS3 regions of the study area.This applies particularly for winter wheat.Similar results have been reported for Danish conditions (Olesen, Bøcher, & Jensen, 2000).
Integrating the effect of diseases in WOFOST/CGMS has been assessed through the recalibration of other parameter (i.e.leaf span parameter; El Jarroudi et al., 2012) and could be explored in such approach.Other reason is the lack of information on the accuracy of official reported yields which may impact the regression analyses results.Finally, although TSUM1 and TSUM2 are important parameters for crop phenological development in WOFOST/CGMS, their recalibration through the approach involved in this study had limited effect on the inter-annual variability of the CGMS simulation results.This calls for other sensitive parameters to be included into the recalibration process.Direct calibration on regional yield statistics is questionable as it lumps the influence of many factors into one value and invalidates the biophysical interpretation of model parameters (Hansen & Jones, 2000).With the rapid expansion of constellations of high spatial resolution instruments and the use of data from administrative systems, the spatial distribution of key crop growth model parameters at NUTS3 level should be readily assessed.Further researches are currently carried out based on green area index data retrieved from MODIS observations (Duveiller, Baret, & Defourny, 2011).

Figure 4 .
Figure 4. Scatterplots of observed vs. predicted day of anthesis (DOA) and day of maturity (DOM) for winter wheat (A) and grain maize (B)

Figure 5 .
Figure 5. Spatial distribution of the average maximum leaf area index (LAI) and harvest index for winter wheat over the study site Results of 'classic' simulation are on the left side; Results of 'recalibrated' simulation are on right.

Figure 6 .
Figure 6.Spatial distribution of the average maximum leaf area index (LAI) and harvest index for grain maize over the study site Results of 'classic' simulation are on the left side; Results of 'recalibrated' simulation are on right.

Figure 7 .
Figure 7. Coefficient of determination (R 2 ) between simulated potential grain (PG) and official grain yields at NUTS3 level for winter wheat Hatching NUTS3 are those with a significant R 2 (P < 0.05).

Figure 8 .
Figure 8. Coefficient of determination (R 2 ) between simulated water-limited grain (WG) and official grain yields at NUTS3 level for grain maize Hatching NUTS3 are those with a significant R 2 (P < 0.05).

Table 1 .
Recalibrated and default TSUM1 and TSUM2 parameters (expressed in °C.day) for each variety zone, along with the summary statistics obtained for recalibrated parameters NA: not available.

Table 2 .
Summary statistics of simulated crop biomass and grain yield in potential and water-limited conditions within the 300x300 km study site (at the end of growing season) PB, PG: Potential biomass and grain yield, respectively; WB, WG: Water-limited biomass and grain yield, respectively.Values are expressed in kg.ha -1 .SE: Standard error.

Table 3 .
Correlation matrix between classic and recalibrated crop simulation results at the end of the growing season for winter wheat and grain maize at NUTS3 level over the 1990-2007 period

Table 4 .
Coefficient of determination derived from the relationships between official winter wheat yields and simulated crop values in the study site over the 1990-2007 period Values shown in the table are significant at 0.05

Table 5 .
Coefficient of determination (R 2 ) derived from the relationships between official grain maize yields and simulated crop values in the study site over the 1990-2007 period