A Statistical Modeling for Estimation of Wheat Yield Using Weighted Rainfalls in the Punjab , Pakistan

In this paper an effort has been made to develop a statistical model for wheat production in the Punjab, Pakistan. All the variables that have either light or strong impact on the yield of wheat, surveyed by Agriculture Department and Meteorological Department, Punjab, have been taken under consideration. The proposed model is independent of time binding and gives a good estimate of wheat production by using all the inputs of the crop in a particular season. A new concept of weighted rainfalls has been incorporated in the model and a detailed comparative study between actual total and weighted rainfalls has been carried out on each divisional and provincial level modeling for the selection of the best acceptable model and to observe the impact of both types of rainfalls on the quality of estimation of yield. The predictive performance of the final selected model has been assessed through various validity tests. For the purpose a data, in hard form, of 25036 cases comprising almost 1.2 million values have been fed and used. The study is very helpful for agricultural planners in respect of the most important staple food of Pakistan. Also many recommendations to farmers can be made about different inputs of the crop.


Introduction
Area of Pakistan is 796,096 sq.km and the population is almost 170 million.The country has four provinces namely Sindh, Punjab, Khyber Pakhtoonkawa and Balochistan.The Punjab being the second largest (area of Punjab is 205,345 sq.k.m) and the most agricultural province of the country, contributes almost 75% to the total wheat production of Pakistan.
There are two zones of the Punjab: lower and upper.In Kharif season (May to September), rice crop is sown in the upper and cotton is sown in the lower Punjab.Wheat crop is sown in the Rabi season (October to April) throughout the province.
There are two major categories of the area in the Punjab: Irrigated and Un-irrigated (Barani).In irrigated area, there is availability of canal as well as tube-well water, so farmers are comparatively less dependent on the rainfalls.But in case of Barani areas, crops are totally dependent on timely rainfalls as there is no availability of any other source of water.The most of the upper Punjab area is hilly and Barani.
The Punjab is administratively and geographically divided into 133 tehsils, 36 districts and 9 zones (each zone is called a Division).Each division comprises 3 to 6 districts having meteorologically identical state.The name of District-Headquarter of a division is same as the division.
A secondary data of wheat production for the year 2005-06 to 2008-09 have been taken from the Crop Reporting Service (CRS), an attached wing of Agriculture Department, Government of the Punjab, solely responsible for handling of all kinds of Agricultural Statistics in the Punjab.
The total number of selected sample villages is 1086 and in each sample village, six randomly selected plots of 15x20 sq ft in three randomly selected fields of wheat have been harvested.Yield of each plot along with all the 16 variables, having 43 categories impacting the yield of the crop, have been recorded.The variables recorded for each field are as follow: - A total yield data of 25036 plots all over the Punjab amounting to almost 1.2 million values have been used in the study.

Results and Discussion
Various models on different stages and categories of data have been developed and their utility has been discussed through empirical study.The distribution of this section is based on different stages and different categories under discussion.

Choice of Regression Technique
In the first step of model building, type of regression or model is specified.As the response variable is continuous quantitative and also out of 43 explanatory variables, 15 variables are categorical, therefore 'Multiple Linear Regression' technique is proposed for model building.An important assumption of normality of response variable is verified and illustrated by the histogram and P-P plot using yields of 25036 sample plots from all over the province, given in the Figure 1.
Verification of linear relationship between response and explanatory variables is not checked when some of predictors are categorical (Montgomery, 2001).Also distribution of error terms is same as of the response variable.

Choice of Model With/Without Intercept
Both type of Liner Regression Models with or without intercept are used in model building purpose.The choice is based on the predictive performance of the models.A model possessing better prediction quality is obviously selected for operational use.
Initially two models on Punjab level are developed; one using with intercept and second without intercept term.
For each type, further two types of models have been developed; one using weighted rainfalls, (Qayyum, 2010), and second using total rainfalls as one of the regressor variables.The Table1 bears important information of both types of models.
Making comparison of both types of models, it is evident that all are significant as P-value is much less than 0.05.But there is a considerable difference between values of R² of both models i.e. model without intercept accounts for 92.4% variation in the yield whereas model with intercept explains only 38.5% and 38.3% variation of the response variable in case of weighted and total rainfalls respectively.
Though value of R² in case of model without intercept is much higher than that of model with intercept, but Mean Square Error (MSE) of model without intercept is increased as compared to the value of MSE of model with intercept.While comparing two models, Sum of Squares of Errors is more important predictive performance parameter as compared to R², (Hahn, 1979), i.e. betterment in the quality of prediction by a model is measured by lower value of MSE rather by higher value of R².So it is evident from the Table 1 that models with intercept, both for weighted and total rainfalls, are giving better prediction quality as compared to the models without constant term.Hence a model without constant term will not be used as a final model for wheat projection.Also a modal without intercept increases multicolinearity.

Divisional Level Modeling
Initially, various independent models for each division of the Punjab have been developed and their important parameters are compared to check their suitability.
As it has been discussed earlier that Pakistan agriculture is mostly based on climatic conditions among which amounts of rainfalls are the most important that cause significant changes in the production of any crop.
Considering the importance of rainfalls for wheat production, three types of models for each division have been developed: firstly, a sum of total rainfalls in the whole Rabi season i.e.TotalRains as one of the explanatory variables, secondly weighted rainfalls i.e.WtRains of the season and thirdly individual rainfall of each Rabi month i.e.AllRains have been used as predictors along with other explanatory variables.
Table 2 bears first column of 'Division Name', second column has total sample share from the respective division, third column has three types of rainfall related variables, as mentioned above, using as predictors along with other predictors, fourth column possesses proportion of explained variation R², fifth column has value of Durbin-Watson test as a check for auto-correlation, sixth column has value of MSE, seventh column has P-value against F test for significance of the model, eighth column is about the coefficient of rainfall related variable in the model, ninth column shows number of significant variables in the particular model and last tenth column possesses name of rainfall related insignificant variables.
A detailed comparison of these all information of different divisional models may lead to the best model choice.
Column No seven of Table 2 bears P-value against F test, which shows significance of the models.It is evident from the table that all models are significant; no one is bad fit.Now the best fit out of all good fits has to be selected.
It is common in all the divisional models that R² and MSE of WtRains never be less than corresponding R² and MSE of TotalRains.In five out of eight divisions, values of R² and MSE, in case of WtRains are greater than corresponding values of TotalRains.Only in three divisions i.e.Lahore, Faislabad and Sargodha, the values are same in both cases.
Against the rain variable AllRains, independent rainfall of each Rabi month has been used as a predictor in the model.It is important to highlight that as the variation in the rainfalls of each Rabi month decreases, statistically they become insignificant and necessarily they are dropped out from the study of modeling by the software.But it is illogical that rainfalls are insignificant in estimating wheat production.
In all the divisional models, against the variable AllRains, rainfalls of some of Rabi months were proved insignificant.For instance, in Gujranwala model, rainfalls of month Nov, Dec, Jan and Feb were proved insignificant, wherein it is the prime period of Rabi season when the crop needs timely watering.Similarly, in all the other divisional models, rainfalls of main Rabi months were proved insignificant.Though, in some cases, they have a little higher value of R² as compared to the case of WtRains, but a model without rainfalls cannot be preferred as a final operational model for yield estimation of wheat.
As reasoned in the last paragraph, dropping the variable AllRains, a selection has to be made between the models of TotalRains and WtRains.On comparison of both types of the models, it is evident that WtRains variable proved insignificant only in one case of Rawalpindi division when TotalRains and AllRains are also insignificant because of minor variation in the rainfalls.In rest of all the other cases of divisional models, WtRains was proved as a significant contributor to the yield.Like, in Gujranwala model, both TotalRains and AllRains were insignificant but WtRains was significant.
Another important comparison was made in the eighth column of Table 2 with title 'Coeff in Model'.It bears the coefficients of TotalRains and WtRains variables in their respective models.In all the cases, value of coefficient of WtRains is more than that of TotalRains, which depicts that per unit change in the yield of wheat against one unit change in WtRains is more than that of TotalRains, i.e.WtRains contributes more significantly as compared to TotalRains.For example, in case of Bahawalpur model, both WtRains and TotalRains are significant in their own models and their coefficients are 0.066 and 0.019, respectively.It can be interpreted that against 1 mm increase in weighted rainfall causes 66 gm increase in the yield of wheat per area of 300 ft² where as this increase in case of TotalRains is 19gm.Importantly, because of better contribution, MSE in case of WtRains is less than that of TotalRains.
Fifth column of Table 2 shows value of Durbin-Watson test for auto correlation check.In all the cases except the models against Rawalpindi and Sargodha Divisions, its value is more than 1 and many researchers have used 1+ value of Durbin-Watson test as an acceptable value.

Provincial Level Modeling
After discussion of divisional level modeling, various models on the Punjab level have been constructed.On comparison of divisional and provincial level models, the best one is to be selected.
Like divisional models, all the models on provincial level are also significant, i.e. no one is bad fit as shown in the column no 9 of Table 3.Following the divisional level pattern, models of three types of Punjab's areas were developed: first for irrigated, second for un-irrigated and thirdly by combining both irrigated and un-irrigated areas.Total 90% of Punjab area is irrigated and only remaining 10% is un-irrigated having significant difference in production of wheat.So same parameters for comparison of modeling have been used as in case of divisional level modeling.
Fourth column of Table 3 shows that value of R² against WtRains is more than that of TotalRains and more importantly MSE of WtRains is less than that of TotalRains in all level modeling.
Here also individual rainfalls of important months of Rabi are insignificant.Like in case of Irrigated area model, rainfalls of Nov, Dec and Mar are insignificant, in case of Un-irrigated area model rainfalls of almost all the Rabi months are insignificant and in the overall model of Punjab, rainfalls of Nov and Mar are insignificant, which is not parallel to the ground reality.So model using individual rainfalls of Rabi months as predictors is not suitable for projection of yield of wheat though it has slightly better value of R² and MSE.
In provincial level modeling, two important criteria, Akaike Information Criterion (AIC) and Schwarz Information Criterion (SIC), (Gujarati, 2003), for the selection of competing models have been used.These both criteria impose a harsher penalty on involving more number of regressor in the model than Adj R².Comparing AIC and SIC, later imposes more harsher penalty on number of regressor in the model than first one.A model with the lowest value of AIC / SIC is preferred among its competing models.Main advantage of these criteria is that the model with least value of AIC / SIC is also useful for not only in-sample but also out-sample forecasting performance, (Gujarati, 2003) Fifth and sixth column of Table 3 bear values of AIC and SIC for each corresponding model, respectively.In all the WtRain models, values of AIC and SIC are lower than the model with TotalRAin, i.e. these both criteria also indicate like R² and MSE that model with WtRain is better than TotalRain.Also these criteria support the final selection of the model on Punjab level using WtRain as a regressor.
Excluding the model of AllRains, in all the rest of models, WtRains model is better than TotalRains in all respects.For example, in both types of models of Irrigated and Un-irrigated areas, R² is higher, MSE is less and Durbin-Watson test value is more than 1 in case WtRains as compared to TotalRains.So on the basis of maximum value of R², least value of MSE and lower value of AIC / SIC, the WtRains model against overall Punjab area is the best one and is selected as the final acceptable model for the estimation of wheat production.
By applying different regression diagnostics, the quality of the final model can be improved i.e. value of R² can be increased from 0.386, MSE can be decreased from 5.234 and ultimately better estimates would be achieved.

Final Selection of Model for Wheat Projection
An ANOVA table and other important information of the proposed model before applying any regression diagnostic is given in the Table 4.
As the value of P is 0.000, which depicts that the model is significant / appropriate.The value of MSE is 5.234, R² is 0.386 and value of Durbin Watson test for autocorrelation is 1.015 that shows no any serious problem of autocorrelation in the response variable.
Now some more assumptions of Multiple Linear Regression are verified and regression diagnostics are applied so that quality of the proposed model may be improved and consequently a final version of operational model would be achieved.

Constant Variance
In Multiple Linear Regression, an important assumption is that the response variable must have a constant variation.Figure 2 shows variation behavior in the data.
Figure 2 shows that there is a fixed or constant variance in the response variable, i.e. yield of wheat is equally dispersed throughout the sample data of 25036 plots.It is depicted by a rectangular shape of response variable in its graphical presentation, i.e. there is no upward / downward / cyclical trend in the data.

Outliers
The Figure 3 shows a diagrammatic presentation of values of standardized residuals by fitting a Multiple Linear Regression Model on the given data.
As depicted in Figure 3, there are some values out of 25036 observations that are beyond the limits of 3, which are outliers.By omitting these outliers from the data, the Figure 4  Now these total observations are 24902, i.e. 134 (only 0.535%) values have been dropped being the outliers.By removing the outliers, the quality of projection by the proposed model is improved as depicted in the ANOVA Table 5.
The new improved model is also a good fit as value of P < 0.05, value of MSE is reduced to 4.941, R² is increased to 0.403 and value of Durbin Watson test is increased to 1.029.These are the indicators that exclusion of outliers from that data has improved the quality of the model.

Influential / Leverage Values
An influential or leverage value may have a great impact on the overall quality of prediction by the model.Also illogical signs of coefficients of predictor's variables, which cannot be interpreted with reference to the real ground situation, unsustainable coefficients and dropping an important variable being statistically insignificant, are all due to the influential values.
Accounting for the importance of influential values, two techniques have been used to detect them: Cook's Distance and Covariance Ratios.
Total 760 (3.04%) out of 25036 observations are discarded being the influential values on the basis of their covariance ratios.The ANOVA Table 6 shows the changes in the proposed model.
This model is also significant or a good fit as P-value is less than 0.05.Values of both MSE and R² are improved and they are 4.199 and 0.449 respectively.Value of R² is directly proportional to the variation in regressor variables, (Hahn, 1973).As in the study 21 regressor variables are categorical, so value of R² cannot be increased substantially from its current value.Result of Durbin Watson test is also better.On the whole, by extracting influential values using covariance ratio technique, the quality level of the model is improved.Where as values of R² for Multi Sensor Estimates (MSE) and QC_Coop estimate using Multiple Regression Analysis for Corn Yield and Rainfalls are 0.24 and 0.22, respectively, (Westcott, 2003) 2.5.4Normality of Residuals One of the assumptions of Multiple Linear Regression is that errors terms must be normally distributed, i.e. there must be no particular trend in the distribution of error terms.The Figure 5 shows histogram and P-P plot of error terms.
Both histogram and P-P plot of standardized residuals show that there is no serious issue of abnormality in the error terms.All the points / observations are dispersed along the straight line in the P-P plot that shows that neither of the tail of normal curve is thick and it is also illustrated by the histogram of Regression Standardized Residuals.

Multicollinearity
Multicollinearity refers to the linear dependency in the regressor variables, which makes the regression coefficients unstable, illogical / immature results of statistical inferences and unrealistic signs with the regression coefficients.Due to existence of multicollinearity, model coefficients are dramatically changed with the change of sample.
The most widely used diagnostic for multicollinearity is Variance Inflation Factor (VIF).The value of VIFs should be as low as possible for the indication of absence of multicollinearity.A value of VIF more than 10 shows a serious problem of multicollinearity, Montgomery (2003).Referring Table 7 of Final Model, the last column shows value of VIF against each regressor variable.It is revealed that no value of VIF is greater than 10, rather not greater than 8.It is a clear indication that there is no serious problem of multicollinearity in the model i.e. all explanatory variables are linearly independent.

The Final Model
Following the initial important Multiple Linear Regression diagnostics, now the final model is presented in the Table 7.
Out of total 37 explanatory variables, 33 variables are statistically proved significant as column six of Table 7 bears P-value of each regressor variable.No P-value is greater than 0.05 i.e. these variables really contribute to the variation in the response variable; Yield of wheat.Also Standard Errors of regression coefficients are nominal i.e. all values of standard errors are less than 1 that shows their stability.

Interpretation of Regression Coefficients
Now each coefficient of regressor variable is independently interpreted.

a) Constant
The model does not pass through the origin i.e. it has a non-zero y-intercept, means in any case yield of wheat will not be nil if all other parameters are fixed except in the case of irregular variation like flood, hailing or fire etc, which cannot be accommodated in the model.
It also logically true that, excluding irregular variations, yield of a plot cannot be zero that's why a model without intercept has not been selected as a final model for wheat projection as earlier discussed in the Section: 2.2 According to the model, keeping all the parameters fixed, yield of a plot will be 2.976 kg per plot of 300 ft² (11.58 m/ac) b) Un-irrigated It was discussed earlier that agricultural land of Punjab is divided in to two categories: irrigated and un-irrigated.
As it is a generally known belief / experience that production of wheat in irrigated area is more than that of un-irrigated one, the same is depicted by the coefficient of regressor Un-irrigated.Type of irrigated area is used as a base category and coefficient -1.965 depicts that wheat production is decreased on the average by 1.965 kg per plot (7.64 m/ac or 285.30kg/ac) when category is switched from Irrigated to Un-irrigated.

c) Variety
There are more than six varieties of seed that are mostly applied for wheat in the Punjab.But half of farmers use Inqlab-91 variety in the province, so it has been used as a base category.The coefficient 0.290 of regressor variable Variety shows that production of other varieties is more than Inqlab-91 i.e. production is increased on the average by 0.290 kg per plot (1.13 m/ac or 42 kg/ac) when category of variety is switched from Inqlab-91 to others.

d) Seed_From
In Punjab, seed of wheat is obtained from two sources; government certified seed and other is own farmers home seed.Only 20% farmers use certified seed of wheat.As a big majority of farmers of the province use home seed, so it has been used as a base category.The coefficient 0.415 of variable Seed_From shows that wheat production is increased on the average by 0.415 kg per plot (1.61 m/ac or 60.25 kg/ac) when category of source of seed is switched from Own Home to Certified.

e) DAP Fertilizer
Two major varieties of fertilizers are used in the Punjab over the crop of wheat: DAP and Urea.These both varieties are included in the study as independent regressor variables.The quantity of fertilizer used has been measured in Kg, not in number of bags.The coefficient against the variable DAP shows that wheat production is increased on the average by 0.022 kg per plot (0.086 m/ac or 3.19 kg/ac) against 1 kg increase in the quantity of fertilizer DAP.

f) Urea Fertilizer
Fertilizers Urea and DAP are applied in a specific proportion on the crop of wheat.The fertilizers are rarely used alone because a combination of both fertilizers returns a better result rather than application of any one of them.
The coefficient of Urea depicts that production of wheat is increased on the average by 0.015 kg per plot (0.058 m/ac or 2.18 kg/ac) against 1 kg increase in the quantity of Urea.

g) Number of Plough (No_Plough)
Mostly tractors are used for ploughing in the province.The variable has been recorded as the number of times ploughs were made prior to the sowing time of wheat.A well preparation of soil by ploughing causes a better production of the crop.The coefficient of No_Plough depicts that yield of wheat is increased on the average by 0.173 kg per plot (0.67 m/ac or 25.12 kg/ac) against one count increase in the number of plough.

h) Number of Level (No_Level)
After ploughing, leveling is also used for even distribution of water, seed and fertilizers throughout the field.A poor leveling may cause poor distribution of inputs and consequently a poor production of the crop.The variable No_Level is also measured as a count of number of leveling applied.Its coefficient reveals that yield of wheat is increased on the average by 0.088 kg per plot (0.34 m/ac or 12.78 kg/ac) against one count increase in the number of levels.
i) Number of Water (No_Water) Variation in the number of waters applied to the crop is proportional to the availability of timely rainfalls.In case of increase in the timely rainfalls, number of water is decreased and vice versa.The variable No_Water is measured as a counter of total number of water applied to the crop from its sowing to maturity level.Its coefficient says that average production of wheat is increased by 0.117 kg per plot (0.46 m/ac or 16.99 kg/ac) against one count increase in the number of water.

j) Disease
The crop of wheat may face various diseases during its life due to a short / excess input or because of any climatic / soil distortion.There is a considerable difference of yields between healthy and diseased crops and it is same depicted here by the coefficient of variable Disease.It has been measured in Yes (diseased) and No (no disease) and later has been used as base category.The coefficient of variable Disease depicts that production of wheat is decreased on the average by 0.262 kg per plot (1.02 m/ac or 38.04 kg/ac) when switching from healthy to diseased crop.

k) Spray
Spray of pesticides for particular diseases or weeds is essential when the crop is facing such a problem in order to get a better production.It is measured with the variable Disease that either the farmer has used any spray on the crop or not.Like Disease variable, it also has been measured in Yes (sprayed) and No (not sprayed), as a base category.Its coefficient depicts that production of the crop is increased on the average by 0.756 kg per plot (2.94 m/ac or 109.76 kg/ac) when switching from no spray to sprayed category.It also emphases the utility of pesticides spray on the crop.l) Weighted Rainfalls (WtRain) The concept of weighted rainfalls, as discussed earlier, is incorporated in the model and has a coefficient of 0.013, which reveals that against 1 mm increase in the amount of weighted rainfalls causes an increase of 0.013 kg yield of wheat per plot (0.051 m/ac or 1.89 kg/ac).In case of regressor variable TotalRain, its coefficient is 0.003, which reflects that TotalRain causes less increase in the yield as compared to WtRain.m) Sowing Time (STNov1, STNov2, STDec1, STDec2) Sowing time of wheat plays an important role in the production of the crop.As its sowing is delayed, the yield is gradually decreased.Same pattern is depicted by the variables related to the sowing time period.
Sowing time is divided in to five categories i.e. up to October (base category), first half of November (STNov1), second half of November (STNov2), first half of December (STDec1) and second half of December and later (STDec2).The Table 8 shows changes in the yield of wheat in different sowing periods with reference to base category.
With in parenthesis positive sign shows direct and negative sign inverse relation between sowing time and quantity of yield.As indicated in the table, first half of November is the best for getting the maximum yield of wheat.As the sowing of the crop is delayed, the yield gradually decreases with respect to the base category of sowing period (up to October) and minimum yield is obtained against the crop sown in the last half of December or later.
These results are very much parallel to the ground reality and according to the instruction / guide line given to the farmers by the Agriculture Department of the Punjab.

n) Variables of Seasonal Temperatures
Production of wheat is extremely dependent on the climatic parameters of the province in which temperatures in the months of the whole Rabi season are very important.Average of maximum and minimum temperatures of each Rabi month have been used as independent regressor variables.Only average maximum temperature of March (T_Mar_Max) is statistically proved insignificant but rest of all 13 variables regarding temperature are significantly contributing to the yield of the crop.
The Table 9 summarizes the interpretations of the coefficients of temperature variables.
The coefficient against the variable T_Oct_Min (average minimum temperature of October) can be interpreted as against 1°C increase / decrease in the average minimum temperature of October causes a decrease / increase in the yield of wheat by 0.024 kg per plot (0.093 m/ac or 3.485 kg/ac).But relation between average maximum temperature of October (T_Oct_Max) and yield is positive, which can be interpreted that as the variation between maximum and minimum temperatures of month of October increases, the production of wheat also increases and vice versa.Same opposite pattern between maximum and minimum temperatures can be vetted in case of December and February.This pattern of temperatures can be reversed in case of negative temperatures but the data possess no negative temperature, as in the Punjab winter season, temperature rarely goes down from 0°C especially in wheat growing areas.
Temperatures of November, January and April have same directional pattern.Like in case of maximum average temperature of January (T_Jan_Max) yield per plot is increased by 0.085 kg (0.331 m/ac or 12.341 kg/ac) as average maximum temperature is increased by 1°C.Similarly almost same quantity of yield is increased against 1°C increase in average minimum temperature of the month.(T_Jan_Min).It can be collectively interpreted that when wintriness in the month of January increases, yield of wheat is negatively affected.It is also parallel to the real ground situation that in winter season when temperature considerably goes down, a thin layer of frost appears on the top of wheat plants in morning / dawn time, which damages the germination of the crop.It is important to highlight that minimum temperature of the winter season is mostly recorded in the month of January in Punjab.
Average minimum and maximum temperatures of April (T_Apr_Min and T_Apr_Max) have negative coefficients mean that they have inverse relation with the yield of wheat.Coefficient of T_Apr_Min is -0.038, which can be interpreted that as temperature is increased by 1°C, the yield of an experimental plot is decreased by 0.038 kg (0.148 m/ac or 5.517 kg/ac).Same change is occurred in case of maximum temperature of the month.
It is also very much parallel to real ground observations.At the end of March and start of April, wheat crop is in its last maturity level and germination of grain is on its peak as it gains weight and bigger size during this period.
If temperature rises unusually during this time slot, germination of grain is instantaneously stopped, it gets dried in minor weight and size and consequently production of the crop is decreased.Same problem occurred in Rabi season 2010-11 when production of wheat was considerably decreased as compared to the previous years as maximum temperature rose to 36°C from an average routine temperature of 25°C during this time period.A moderate temperature during April causes a better production of wheat.

o) Variables of Seasonal Humidity
Average humidity level (proportion of moisture in air) of each Rabi month has been used as an independent regressor variable in the model.Humidity of two Rabi months February and March (Humidity_Feb & Humidity_Mar) were proved statistically insignificant and ultimately dropped out of the study.Humidity is also one of the parameters on which wheat production is based.The Table 10 bears a summary of interpretation of significant humidity variables.
Humidity of October / January has positive and November / December / April has negative impact on the yield of wheat.The major impact of humidity on yield is of January and April.The coefficient against variable Humidity_Jan can be interpreted as against 1% increase in humidity causes 0.032 kg increase in the yield of wheat per plot (0.124 m/ac or 4.646 kg/ac).Similarly maximum change against Humidity_Apr is interpreted as; there is 0.034 kg decrease in the yield per plot (0.132 m/ac or 4.936 kg/ac) against 1% increase in humidity of the month.
A main cause of increase in humidity is rainfall.Rainfall of January has maximum weight i.e.January rainfall has maximum positive impact on the production of wheat as compared to rest of all the Rabi months.Same pattern can be observed in case of January humidity as it also has maximum positive impact on the yield.Both results of weighted rainfall and humidity of January validate each other.
Harvesting of wheat is started from the mid of April and is completed at the end of April or maximum in the first week of May.For harvesting of the crop, a dry and hot weather is essential.When crop is ready for harvesting, a cats and dogs rainfall and increase in humidity cause damage to the crop as it lays down along the ground and production is loosed.Depicting the real ground situation, Humidity_Apr has maximum negative coefficient.Vol. 3, No. 4;December 2011 Published by Canadian Center of Science and Education 103

Validation of the Model
After development of the model and conduction of its all-necessary diagnostics, validity of the model is examined before putting it into an operational form.Validity means its behavior in making prediction of the yield for given set of the data, discrepancies raised in its working form and quality of prediction for the new set of data.
In this section, a comparison between actual yield and projected yield by the model for given set of data on districts, divisional and provincial levels independently for irrigated and un-irrigated areas is made and quality of projection is vetted.Further to observe the behavior of the model for new set of data, wheat data of Sargodha division comprising Sargodha, Khushab, Nianwali and Bhakhar districts for the year 2009-10 is used and comparison between actual and predicted values is made.

Analysis of Magnitudes / Signs of Coefficients of the Model
In the first phase of validation of a model, magnitudes and signs of the coefficients of the model are analyzed / compared with the prior experience, physical theory and their realistic status, (Snee, 1977).
In the whole model no sign of a coefficient of a regressor variable is unrealistic and unstable i.e. they are all according to the physical theories of agricultural science, prior experiences conducted on small scales and general perception regarding the production of wheat.In the individual interpretation of each regressor variable logical meaning of signs has been discussed.
Magnitude wise all coefficients are reasonable and no coefficient has shown an extraordinary increase / decrease.For more comparable and compact comprehension, amount of each coefficient has been expressed in per plot, m/ac and kg/ac and nothing has shown exceptional value.
Stability of the coefficients can be examined by vetting their respective standard error S.E (β) and the value of VIF given in the fourth and last column of Table 7, respectively.
The maximum S.E (β) is 0.076 (only 5.5% variation) against the regressor variable STDec2 (sowing time of wheat in the second half of December), which, in spite of highest standard error, is a reasonable value.All others regressor variables have much less standard errors than 0.076 that reflect a sound stability of the coefficients of the final model.
Similarly, the maximum value of VIF for multicollinearity check is 7.252 against the regressor variable Humidity_Dec, which is an acceptable value as a VIF value greater than 10 reflects a serious problem of multicollinearity, (Montgomery, 2003).Rests of all regressor variables have VIFs much less than 7, which is an indication of stability of the coefficients of the model.

Confirmation Runs
In this phase of validity checking of the model, its predictive performance is examined for the given set of data taking different dimensions like its projection behavior for the whole data, for various divisional data segments, for irrigated / un-irrigated areas and for fresh set of data.A reasonable projection level of estimates reflects an acceptable operational use of the model.

a) Confirmation Runs for the Whole Data
Table 11 bears a comparison of actual average yield of wheat (m/ac) with projected yield by the model on district, divisional and provincial levels independently for irrigated / un-irrigated areas.
The overall Squares of Residuals (SR) shows that the model predictive performance is good.On the Punjab level SR is 0.004 including both irrigated and un-irrigated areas, which is quite good.For irrigated and un-irrigated areas independently, it is 0.000 and 0.051 respectively, which reflects that model predictive performance is better in irrigated case as compared to un-irrigated one.
On divisional level, the highest SR is 1.638 in case of D.G.Khan division; in spite of maximum value, it is acceptable SR.Where as on district level, SSR is in two digits only in case of Narowal and Okara districts out of 35 districts i.e. 11.244 and 13.144 respectively.In rest of all the districts, it is less than 10.
It is observed that bigger value of SR is happened only in the case where the number of data points is comparatively less.For instance Rawalpindi and Chakwal districts mostly comprise un-irrigated areas and a few sample points fall in irrigated areas that's why SR for irrigated area is 23.267 and 76.470 and for un-irrigated area 1.317 and 0.011 respectively.It is clearly revealed from the results that the model is behaving well in case of sufficient number of data points.

b) Confirmation Runs for Yearly Data
The Table 12 contains a comparison of estimates of wheat by the model with its actual production segregated on the basis of different years.
On the Punjab level SR is below 1 in all the years i.e. projection quality of the model for each independent year is also quite reasonable.The minimum value of SR, 0.044 is against 2005-06 and maximum 0.570 is against 2008-09 but the difference between them is minimal i.e. it can be said that the model is equally good for each year.Similarly, wheat estimates by the model, independently for irrigated and un-irrigated areas for each year, are mostly less than 1 i.e. the model is equally a good predictor for both types of areas.

c) Confirmation Runs for Fresh Data
To vet the projection behavior of the model for new set of data, wheat production data of 786 randomly selected plots from Sargodha division comprising four districts namely Sargodha, Khushab, Mianwali and Bhakhar for the year 2009-10 has been used.The Table 13 bears divisional and districts estimates of wheat by the model along with the actual production results.
The main problem occurred with the wheat production in the year 2009-10 that temperature suddenly raised unexpectedly at the end of March causing the grain of wheat got dried in the small size i.e. germination of wheat grain suddenly stopped prior to its routine maturity level and resultantly, the production of the crop decreased unusually.It was a quite rare variation, which usually occurs once in a decade.The model well accommodates all the routine variations of the Rabi season and returns good estimates of wheat irrespective of any particular year and zone.

Conclusions
The objective of the study has been achieved.A subjective approach has been converted in to an objective one.
The proposed model is independent of time binding i.e. it might be valid for any year, as it has been observed in model validity Section 2.7.By inputting parameters of a particular year, it returns estimates of wheat for the year.
It is important to highlight that in each year, regressor variables involved in the model are not dramatically changed that's why average yield of wheat per acre remains almost consistent with a little bit variation each year.The total production of wheat of the province mainly depends on the total area under cultivation of the crop each year.
The proposed model does not behave ideally in estimating yield in case of irregular variations, which occur rarely.So its predictive performance for a routine year with usual inputs is excellent.
This research work is an effort to develop a statistical model for projection of wheat crop independent of time binding using a concept of weighted rainfalls.Similar research efforts can be conducted for other major and minor crops.These models should be independent of time and capable of yield production of their respective crops in all the years.
As discussed earlier, amount of rainfalls in each month of the season of a crop plays a significant role in its production.But timely rainfall, may be small in amount, contributes a lot in the production of a crop that's why a concept of weighted rainfall has been used in the proposed model for wheat.Similarly weighted pattern for rainfalls, temperature and humidity level can be determined for all the crops.
on crop / seed -Total Rainfalls in the season -Spray on crop -Weighted Rainfalls in the season -Variety of wheat -Average Humidity of the season -Quantity of seed -Average Max/Min Temperature -Sowing time -Irrigated/Un-irrigated area In Pakistan, yield of a crop is measured in 'maund' (1 Maund = 37.3242 kg) per 'acre' (1 Acre = 198x220 sq.ft).

Table 1 .
Comparison of models with / without intercepts

Table 2 .
Comparison of divisional models of the Punjab

Table 3 .
Comparison of provincial models of the Punjab *Selected model for operational use Table 4. ANOVA Table & other Information

Table 5 .
ANOVA Table & other Information

Table 6 .
ANOVA Table & other Information

Table 7 .
The final model for operational use

Table 8 .
Sowing time variables interpretation

Table 9 .
Interpretation of temperature variables

Table 10 .
Interpretation of humidity variables

Table 11 .
Comparison of actual and projected districts average yields