Sample Size to Estimate the Mean and Median of Traits in Canola

The aim of this study was to determine the sample size (i.e., number of plants) required to estimate the mean and median of canola (Brassica napus L.) traits of the Hyola 61, Hyola 76, and Hyola 433 hybrids with precision levels. At 124 days after sowing, 225 plants of each hybrid were randomly collected. In each plant, morphological (plant height) and productive traits (number of siliques, fresh matter of siliques, fresh matter of aerial part without siliques, fresh matter of aerial part, dry matter of siliques, dry matter of aerial part without siliques, and dry matter of aerial part) were measured. For each trait, measures of central tendency, variability, skewness, and kurtosis were calculated. Sample size was determined by resampling with replacement of 10,000 resamples. The sample size required for the estimation of measures of central tendency (mean and median) varies between traits and hybrids. Productive traits required larger sample sizes in relation to the morphological traits. Larger sample sizes are required for the hybrids Hyola 433, Hyola 61, and Hyola 76, in this sequence. In order to estimate the mean of canola traits of the Hyola 61, Hyola 76 e Hyola 433 hybrids with the amplitude of the confidence interval of 95% equal to 30% of the estimated mean, 208 plants are required. Whereas 661 plants are necessary to estimate the median with the same precision.


Introduction
Canola (Brassica napus L.) belongs to the Brassicaceae family (Tomm, Wiethölter, Dalmago, & Santos, 2009).Canola seeds are utilized as edible oil for human consumption, livestock feed and biofuel feedstock, being the second most prominent oil seed crop in the world (Li et al., 2015).In Brazil, canola is used for grain production during winter cool-season in crop rotation systems.Canola can act controlling root diseases in annual crops by mechanisms of biofumigation and absence of host (Angus et al., 2015).Other synergistic effects on total system yield can be observed, such as weed growth suppression and flexibility in chemical weed control options.Therefore, besides grain yield and oil content, sample dimensioning of dry matter traits is relevant to assess its cover crop potential.
In Agricultural Sciences research, the number of treatments, number of replicates, plot size and shape, and sample size are important factors to be considered in the planning of field crop experiments.The inadequate sizing of these factors can cause problems in data statistical analysis (Storck, Garcia, Lopes, & Estefanel, 2016), such as high coefficient of variation, and consequently low experimental precision and reliability of the research results.Cargnelutti Filho et al. (2015a) investigated the optimal plot size and the number of replicates for combinations of number of treatments and levels of experimental precision to evaluate the fresh matter of canola (Brassica napus L.).The authors provided substantial information for experimental designs with canola.
Given that the size (area) of the experiment (number of treatments × number of replicates × plot size) is defined for a particular crop; the questioning regarding the number of plants (sample size) that should be evaluated in each plot of each treatment commonly arises.Reduced plant number may not be representative, whereas excessive plant number may be unnecessary and impractical.Thus, sample sizing enough plants for a certain precision is essential in the estimation of statistical measures such as measures of central tendency (mean and median).
In symmetric data distributions of a given trait, the mean is an appropriate measure of central tendency.However, in asymmetric distributions (positive or negative), the median separates the ordered set of data into two parts (50% below and 50% above the median) and is a measure of position indicated to represent the central tendency of the dataset.The mean has been extensively used in Agricultural Sciences research but the median would be more adequate in situations of asymmetric data distribution.
Sample sizing studies (determination of the number of plants) can be performed based on data collected in plants of a uniformity trial (blank experiment).Considering that in the experimental area of a uniformity trial with a given crop, all procedures (soil preparation, fertilization, sowing, pest and disease control and evaluations) are carried out uniformly and disregarding the genetic variability between the plants, the remaining variability represents the random (environmental) variation occurring between the plants in the experimental area.
Sampling in a uniformity trial should contemplate the maximum variation that may exist between plants in order to adequately represent the plant population and to enable the study of sample size.Evidently, extrapolations of the sample size for other experimental areas are questionable due to the existence of environmental variations.However, the results of these studies can be used as a reference for the planning of other researches and assist the researchers in defining the number of sampled plants.
Resampling with reposition is an appropriate procedure for sample sizing and exhibits independence of the data probability distribution (Ferreira, 2009).This procedure was used to determine the sample size for the mean estimation in agricultural crops, such as millet (Kleinpaul et al., 2017) and rye (Bandeira et al., 2018a) and for the estimation of mean and median of white lupine (Burin, Cargnelutti Filho, Toebe, Alves, & Fick, 2014) and flax (Cargnelutti Filho et al., 2018).Meantime, a methodology based on Student's t distribution (Bussab & Morettin, 2017) has been used to determine the sample size for the mean estimation of forage turnip (Cargnelutti Filho et al., 2014), black oat (Cargnelutti Filho et al., 2015b), and rye (Bandeira et al., 2018b).These researches have demonstrated sample size variability between traits, cultivars, sowing dates, evaluation times, agricultural years, and between measures of central tendency (median and mean).These studies revealed promising aspects of the correct sample sizing.
We assumed that the number of plants for the estimation of mean and median traits differs between canola hybrids.Thus, the aim of this study was to determine the sample size (i.e., number of plants) required to estimate the mean and median of canola (Brassica napus L.) traits of the Hyola 61, Hyola 76, and Hyola 433 hybrids with precision levels.

Material and Methods
Three uniformity trials (experiment without treatment, which the crop and all procedures performed during the experiment are homogeneous in the experimental area) were carried out with canola (Brassica napus L.) in experimental area of 45 m × 60 m (2,700 m 2 ) in southern Brazil, located at 29º42′S, 53º49′W, with 95 m altitude.According to Köppen climate classification, the climate of the region is Cfa, subtropical humid, with warm summers and without dry season defined (Heldwein, Buriol, & Streck, 2009).The type of soil of the local trial is classified as sandy loam typic Paleudalf (Santos et al., 2013).
In each uniformity trial, an area of 15 m × 15 m (225 m 2 ) was demarcated.Then, 225 plants were randomly selected in each uniformity trial on October 15, 2013 (124 days after sowing).These plants were distanced in 1 meter in a matrix of 15 rows and 15 columns.The plants were in the grain maturation stage at this time.The plants were cut at the soil surface.Thereupon, the plant height (PH) of each plant was measured in cm.In addition, the number of siliques per plant (NS) were counted.In each plant, the siliques were removed.The fresh matter of siliques (FMS), in g plant -1 , the fresh matter of aerial part without siliques (FMWS), in g plant -1 , and the fresh matter of aerial part (FM = FMS + FMWS), in g plant -1 , were obtained by weighing.After drying in an oven, the weight of dry matter of siliques (DMS), in g plant -1 , dry matter of aerial part without siliques (DMWS), in g plant -1 , and dry matter of aerial part (DM = DMS + DMWS), in g plant -1 were obtained.In this study, the PH was considered a morphological trait.Furthermore, NS, FMS, FMWS, FM, DMS, DMWS, and DM were considered productive traits.
For the eight measured traits (PH, NS, FMS, FMWS, FM, DMS, DMWS, and DM), the calculated statistics were: minimum, percentiles 1, 2.5, and 25, median (percentile 50), percentiles 75, 97.5, and 99, maximum, range, mean, variance, standard deviation, standard error, coefficient of variation, coefficient of skewness, coefficient of kurtosis, and p-value of normality Kolmogorov-Smirnov test.For each trait, the means of the hybrids (Hyola 61 versus Hyola 76, Hyola 61 versus Hyola 433, Hyola 76 versus Hyola 433) were compared by Student's t-test (two-tailed) for independent samples at 5% significance.Subsequently, 1,999 sample sizes were planned for each trait.The initial sample size was two plants and the other ones were obtained with the addition of one plant up to 2,000 plants.Therefore, the following sample sizes were planned for the simulations: 2, 3, 4, … 2,000 plants for each trait.
Confidence intervals for the mean and the median can be constructed by resampling, regardless of the probability distribution of the sample data.Thus, for each sample size planned in each trait, there were 10,000 resampling with replacement.For each resample, mean and median were estimated.Thereby, for each sample size of each trait, 10,000 estimates of the mean and 10,000 estimates of the median were obtained (Ferreira, 2009) and percentiles 2.5 and 97.5 were calculated.Then, the amplitude of the confidence interval of 95% for the mean and the median for each sample size of each trait was calculated through the difference between the percentile 97.5 and percentile 2.5.Hereafter, the sample size (i.e., number of plants) was determined to estimate the mean and median of each trait with precision levels.The initial size (i.e., two plants) was used for beginning this determination.The sample size was considered as the number of plants from which the amplitude of the confidence interval of 95% was less than or equal to 15% (greater precision), 16%, 17%, … 30% (minor precision) respectively of the estimated mean and median.These experimental precisions were considered adequate to estimate the mean and the median.
The mean and percentiles 2.5 and 97.5 of the 10,000 means of dry matter of siliques (DMS) and the 10,000 medians of the fresh matter of aerial part (FM) of each sample size were plotted on graphs, for hybrid Hyola 433.These two traits were plotted as a function of the greater sample size needed.In the graphical representation of the mean and the limits of the confidence interval of 95%, an interval of forty plants was used for better visual representation.We started with forty plants because smaller sizes than this one are not of interest since they have large confidence intervals (low precision).The statistical analysis was performed using Microsoft Office Excel® and R software (R development Core Team, 2018).

Results and Discussion
The mean of the PH, NS, FMS, FMWS, FM, DMS, DMWS, and DM traits was greater in the Hyola 76 hybrid, intermediate in Hyola 433, and smaller in Hyola 61, which demonstrates superior agronomic performance of Hyola 76 hybrid (Table 1).In the assessment of six canola hybrids (Hyola 433, Hyola 50, Hyola 61, Hyola 76, Hyola 571 CL, and Hyola 575 CL), Rigon et al. (2017) concluded that Hyola 76 hybrid presented higher number of days of flowering, grain yield and oil, evidencing the superior performance of this hybrid, as verified in our study.
Table 1.Minimum, percentiles 1, 2.5, 25, median (50 percentile), 75, 97.5, and 99, maximum, range, mean, variance, standard deviation, standard error, coefficients of variation (CV), skewness, kurtosis, and p-value of the Kolmogorov-Smirnov normality test of traits (1) measured in 225 plants of canola (Brassica napus L.) of each hybrid (Hyola 61, Hyola 76, and Hyola 433)  The high p-values of the Kolmogorov-Smirnov test for plant height of hybrids Hyola 61 (p-value = 0.871) and Hyola 433 (p-value = 0.677) and the asymmetry and kurtosis coefficients not different from zero respectively demonstrate symmetric and mesokurtic distribution.Consequently, the mean is suitable as a measure of central tendency of these data.On the other hand, for plant height of Hyola 76 hybrid, the low p-value of the Kolmogorov-Smirnov test (p-value = 0.008), the asymmetry coefficient lower than zero (negative asymmetry) and the kurtosis coefficient greater that zero (leptokurtic distribution) indicates that the median is more adequate than the mean as a measure of position.Therefore, determining the sample size for the mean and median estimation is relevant for this context.
The seven productive traits of the three hybrids (Hyola 61, Hyola 76, and Hyola 433) presented mean greater than median, asymmetry greater than zero and low p-value of the Kolmogorov-Smirnov test (p-value ≤ 0.144) (Table 1 and Figures 1, 2 and 3).This reveals that these traits do not fit the normal distribution and consequently the median is more reliable than the mean as a measure of position.The positive asymmetry of productive traits may be associated with possible right-positioned unilateral outliers.These outliers contribute to the mean being greater than the median.The possible outliers were maintained in the study in order to contemplate the data variability, conferring credibility of the sample size study.Removal of outliers could underestimate the sample size.
Based on the amplitude, variance, standard deviation, standard error, and coefficient of variation, greater variability of the productive traits (NS, FMS, FMWS, FM, DMS, DMWS, and DM) in relation to the morphological trait (PH) was found (Table 1).This suggests that a larger sample size is required for productive than for morphological traits.
The sample size (number of plants) for mean estimation of these eight traits (PH, NS, FMS, FMWS, FM, DMS, DMWS, and DM) of canola hybrids with amplitude of confidence interval of 95% equals to 15% of the mean estimate (higher precision) ranged from 46 to 685 for Hyola 61, 18 to 480 for Hyola 76, and 40 to 815 for Hyola 433, respectively for the traits plant height and dry matter of siliques (Table 2).At the other extreme, the sample size (number of plants) for mean estimation of these eight traits (PH, NS, FMS, FMWS, FM, DMS, DMWS, and DM) of canola hybrids with amplitude of confidence interval of 95% equals to 30% of the mean estimate (lower precision) ranged from 12 to 175 for Hyola 61, 5 to 125 for Hyola 76, and 10 to 208 for Hyola 433, respectively for the traits plant height and dry matter of siliques.Sample size variability between traits for mean estimation was also verified in agricultural crops, such as white lupine (Burin et al., 2014), forage turnip (Cargnelutti Filho et al., 2014), black oat Cargnelutti Filho et al., 2015b), millet (Kleinpaul et al., 2017), rye (Bandeira et al., 2018a, Bandeira et al., 2018b), and flax (Cargnelutti Filho et al., 2018).Among the traits for the three hybrids, larger sample sizes are required for the productive traits (NS, FMS, FMWS, FM, DMS, DMWS, and DM) in relation to the morphological one (PH), being explained by greater CV values by the productive traits in relation to the morphological one.Larger sample sizes of the productive traits in relation to the morphological traits were also obtained in white lupine (Burin et al., 2014), forage turnip (Cargnelutti Filho et al., 2014), black oat (Cargnelutti Filho et al., 2015b), and flax (Cargnelutti Filho et al., 2018).
(1) PH: plant height; NS: number of siliques; FMS: Fresh matter of siliques; FMWS: fresh matter of aerial part without siliques; FM: fresh matter of aerial part; DMS: dry matter of siliques; DMWS: dry matter of aerial part without siliques; and DM: dry matter of aerial part. (2)The sample size was not calculated (> 2,000 plants).
For the same precision, the sample sizes for median estimation of these eight traits (PH, NS, FMS, FMWS, FM, DMS, DMWS, and DM) were decreasing in the following order of hybrids: Hyola 433, Hyola 61, and Hyola 76 (Table 3).These results reinforce the previous assertion that in experiments with the three canola hybrids, the Hyola 76 would be preferred, since the hybrid allows experiments with greater experimental precision (lower variation) when compared to Hyola 433 and Hyola 61 hybrids.
With the exception of plant height (PH) of the Hyola 76 hybrid, with similar sample sizes for the mean and median estimation, the remaining cases (3 hybrids × 8 traits) evidenced that larger sample size is required for median estimation in relation to the mean estimation (Tables 2 and 3).In white lupine (Burin et al., 2014) and in flax (Cargnelutti Filho et al., 2018), larger sample sizes were also found for median estimation in relation to the mean.
Considering the sample size variability between traits (PH, NS, FMS, FMWS, FM, DMS, DMWS, and DM), hybrids (Hyola 61, Hyola 76, and Hyola 433) and measures of central tendency (mean and median), choosing the desired precision and assuming the largest sample size to estimate measures of central tendency of interest (medium or medium) is recommended in order to contemplate all situations.
For example, if the researcher decides to estimate the mean with precision level of 30%, 208 plants would be sufficient for these eight traits and three hybrids (Table 2, Figure 4A).Thus, when planning a field experiment in a completely randomized experimental design, 208 plants per treatment should be evaluated to estimate the mean of each treatment with 30% of precision.If the experiment is planned with five replicates per treatment, 42 plants per replicate (208/5 = 41.6 ≅ 42) would be sampled, i.e., 42 plants per plot (210 plants per treatment).In addition, if 10 treatments would be evaluated in the experiment, 2,100 plants would be sampled (210 per treatment).For the same precision (30% of the mean estimate), if the researcher designs the experiment only with the Hyola 76 hybrid (better performing hybrid with more precision), 125 plants per treatment (125/5 = 25 plants per plot) would be enough for these eight traits (Table 2).In the experimental area, the plants would be randomly taken in each plot of each treatment.Larger sample sizes are required for the hybrids in this order Hyola 433, Hyola 61, and Hyola 76.
For mean estimation of canola traits of the Hyola 61, Hyola 76, and Hyola 433 hybrids with the amplitude of the confidence interval of 95% equal to 30% of the estimated mean, 208 plants are required.
For median estimation of canola traits of the Hyola 61, Hyola 76, and Hyola 433 hybrids with the amplitude of the confidence interval of 95% equal to 30% of the estimated median, 661 plants are required.

Figure 3 .
Figure 3. Frequency histograms of traits measured in 225 plants of canola (Brassica napus L.) hybrid Hyola 433.In the histograms, the line represents the normal distribution curve.Median (percentile 50), mean, Standard error (Sd), skewness, and p-value of the Kolmogorov-Smirnov normality test of the traits Based on the percentiles, proximity of the mean and median estimates, asymmetry and kurtosis coefficients, p-value of the Kolmogorov-Smirnov test and frequency histograms for the three hybrids (Hyola 61, Hyola 76, and Hyola 433), we observed that the morphological trait (plant height), except for the plant height of the Hyola 76 hybrid, exhibited better adherence to the normal distribution in relation to the seven productive traits (NS, FMS, FMWS, FM, DMS, DMWS, and DM) (Table1; Figures 1, 2 and 3).
(1)PH: plant height; NS: number of siliques; FMS: Fresh matter of siliques; FMWS: fresh matter of aerial part without siliques; FM: fresh matter of aerial part; DMS: dry matter of siliques; DMWS: dry matter of aerial part without siliques; and DM: dry matter of aerial part.For each trait, means of the Hyola 61, Hyola 76, and Hyola 433 hybrids not followed by the same letter in the column differ by Student's t-test (two-tailed) for independent samples at 5% significance, with 448 degrees of freedom. (2)* Asymmetry differs from zero by Student's t-test at 5% probability level.ns Non-significant. (3)* Kurtosis differs from zero by Student's t-test at 5% probability level.ns Non-significant.
Larger sample sizes are required for the productive traits in relation to the morphological trait. jas.ccsenet.