Fitting Volume Models for Parana Pine With a Nonlinear Regression, Genetic Algorithm and Simulated Annealing

Improving volumetric quantification of Parana pine (Araucaria angustifolia) in Mixed Ombrophilous Forest is a constant need in order to provide accurate and timely information on current and future growing stock to ensure forest management. Thus, the present study aimed to evaluate and compare the volume estimates obtained through Nonlinear Regression (NR), Genetic Algorithm (GA) and Simulated Annealing (SA) in order to generate accurate volume estimates. Volumetric equations were developed including the independent variables diameter at breast height (dbh), total height (h) and crown rate (cr) and from the fit through the NR, GA and SA approaches. The GA and SA approaches evaluated proved to be a reliable optimization strategy for parameter estimation in Parana pine volumetric modelling, however, no significant differences were found in comparison with the NR approach. This study therefore contributes through the generation of robust equations that could be used for accurate estimates of the volume of the Parana pine in southern Brazil, thus supporting the planning and establishment of management and conservation actions.


Introduction
Timber valuation is a prime requirement to ensure forest management. In this regard, forest managers constantly resort to estimating forest attributes in order to provide accurate and timely information on current and future growing stock and to assess the economic benefits of their forests, not only at large scale industrial forests but also at small scale forests (Tiryana et al., 2021). Therefore, the development of accurate prediction tools is crucial to support forest management decisions, which have to be adapted to suit the particular circumstances (Özçelik et al., 2010). require initial parameter estimates to start the optimization and often the quality of the final solution is dependent upon the position of this starting point in the search space (Roush & Branton, 2005). Another difficulty with nonlinear models fitting is determining if a given minimum is the best (global) minimum or a suboptimal (local) minimum (R. L. Haupt & S. E. Haupt, 2004).
As an alternative to solve these problems, it is possible to resort to more robust optimization methods, such as metaheuristic algorithms approaches (Bonilla-Petriciolet et al., 2005). These methods present several advantages over traditional nonlinear regression (NR), such as the possibility of specifying only the range of the model parameters (Kapanoglu et al., 2007;Moreira et al., 2013). In addition, metaheuristic approaches are more appropriate to deal with ill-conditioned optimization problems than is nonlinear regression (Hadi & Gonzalez-Andujar, 2009).
In this regard, genetic algorithm (GA) is a computational method modeled on the theory of biological evolutionary processes that can be used to find optimal solutions (Roush & Branton 2005). The success of the algorithm is attributed to various factors including its powerful parallel search capability, computational simplicity, robustness and focus on global rather than local search space (Hadi & Gonzalez-Andujar, 2009;Pohjankukka et al., 2018;Zeng et al., 2007). Simulated annealing (SA) is another notable heuristic approach based on neighborhood search technique (Dong et al., 2015). It has analogy with annealing of a metal from which the name comes and the central idea of the method is to avoid local optima by accepting probabilistically moves to worse solutions (Packalén et al., 2012).
Thus, the present study expected to contribute to the increase of information on the volumetry of the Araucaria angustifolia in southern Brazil, with the objective of evaluating and comparing the volume estimates obtained through Nonlinear Regression (NR), Genetic Algorithm (GA) and Simulated Annealing (SA).

Study Site
The data were collected in an 84-hectare uneven-aged natural forest located in the municipality of Lages, SC (27°48′S; 50°19′W). The region's climate is mesothermic humid (1,360 to 1,600 mm), with no defined dry season (Cfb), with an average annual temperature between 13.8 and 15.8 ºC and a relative humidity of 80% (Alvares et al., 2013). The predominant soils in the region are humic nitosols and humic cambisols developed from basaltic rocks.

Data Collection
A total of 308 araucaria trees were selected in the forest by an entirely random sampling process within diameter classes (see the relative sampling error in the Table 1), covering the range of diametric distribution previously established by (Hess et al., 2010). In each tree were measured: the diameter at breast height (dbh) with a diametric tape; the total height (h) and the crown insertion height (hic) with a Vertex IV hypsometer. To obtain the diameters along the trunk, Hohenadl's rigorous cubing method was considered with the relative diameters (d i ) measured at specific heights (h i ) of 10%, 30%, 50%, 70% of the h, as well as the diameter at the crown insertion (d ic ), thus allowing the determination of the commercial volume (v) of the trees.
To determine the d i and h i values, the electronic dendrometer Criterion RD 1000 by laser technology® was used, which allowed the cubing of standing trees. In this case, the destructive method was not applied due to the restrictive legislation for cutting trees of this species, even for research purposes. The diameter at 90% of the height was not considered because it is located inside the canopy and the presence of branches hinder its determination in the forest.
The crown rate (cr) was obtained considering the ratio between the crown length (cl) and the total height (h). The cl was calculed by the difference between the total height (h) and the height at the crown insertion (hic) of the tree. The characteristics of the trees evaluated are shown in Table 1. Note. dbh = diameter at breast height (cm); h = total height (m); hic = crown insertion height; h/d = ratio between height and diameter (%); cr = crown rate (%); v = tree commercial volume (m³); S.D. = standard deviation; Rse = relative sampling error (%).
The nonlinear trend between the volume variable (v) and the diameter variable (dbh) can be seen through ( Figure  1). Figure 1. Dispersion of volume data as a function of dbh

Volume Modeling
The following volumetric models (i, ii and iii) were selected to be fitted through Nonlinear Regression, Genetic Algorithm and Simulated Annealing methods.
The fitting of the models through nonlinear regression was performed using the Gauss-Newton algorithm. For the optimization through Genetic Algorithm, the R package 'GA' was used, setting the following parameters: population size = 2500; maximum number of iterations = 50,000; crossover probability = 0.8; mutation probability = 0.1. For the Simulating Annealing implementation, the R package 'likelihood' was used and the following parameters were assumed: initial temperature = 3; cooling rate = 0.95; maximum number of iterations = 50,000.
Both GA & SA have the advantages of defining only a range of the equation parameters in the search space (R. L. Haupt & S. E. Haupt, 2004), which is not possible in the case of nonlinear regression in some specific programs that require starting points.

Relationships of Variables
The adjustment of the models of h and cr were possible by the following expressions: Where, h = total height (m); cr = crown rate (%); dbh = diameter at breast height (cm); β0; β1 = parameters to be estimated.
The relationship between the variables (v, dbh, h and cr) were analyzed using the three-dimensional representation with the response surface.

Statistical Analysis
In the evaluation of the models generated by the different approaches, we considered as goodness-of-fit criteria the root mean square error (RMSE) (Expression vi), the Bias (B) (Expression vii), the Akaike's information criterion (AIC) (Expression viii) and the graphical analysis of the residuals.

RMSE
Where, RMSE = root mean square error; B = Bias; AIC = Akaike's information criterion; y i = observed value for the i th observation; y i = predicted value for the i th observation; n = number of observations in the dataset; k = number of parameters.
To verify significant differences among the different fitting approaches, was tested the equality of variances, the F test was applied at the 5% level of significance. The non-linear extra sum of squares method was used to compare the models (i, ii and iii) (Bates & Watts, 1988). In this method, the fitting of full and reduced models is required. The significance of the comparison between full and reduced models is based on the F-test, according to the following expression: Where, S SER = is the sum square error of the reduced model; S SEF = is the sum square error of the full model; d fR = is the degrees of freedom for the reduced model; d fF = and is the degrees of freedom for the full model. The non-linear extra sum of squares follows an F-distribution. F-test was considered significant if the Pr-value for the test is less than 0.05. All the statistical analyses were processed using software R 3.4.4.

Results
The sample trees covered a wide range of diameter (9.8-86.0 cm) and height (7.2-25.0). In addition, from very large crown trees to trees with shorter crowns were included in the sample (5.1-75.0%). On average, the trees volume was 1.2118 m³ (Table 1 and Figure 1). Relative sampling error values were below or very close to 10% for all evaluated variables.
The coefficients fitted through NR, GA and SA, followed by their respective values RMSE, B and AIC, are presented in the showed better performance among those evaluated for the volumetric estimates. All the coefficients of the equations fitted presented significance (Pr < 0.0001, see Appendix A). Significant differences observed between models (i and ii; i and iii; ii and iii) based on the F-test using the non-linear extra sum of squares method, which allows the comparison between equations with different amount of coefficients, corroborate with the precision statistics and confirm the superiority of the model iii (Appendix B). Negative values for the b 3 coefficient were found due to the fact that a reduction in the percentage of crown rate reflects an increase in volume. Note. NLS = nonlinear regression; GA = genetic algorithm and SA = simulated annealing.
Additionally, regarding to the different fitting approaches, small differences in the adjusted values of the parameters were observed among nonlinear regression, genetic algorithm and simulated annealing (Table 3), resulting in very similar precision statistics. The comparative F test between the different volumetric estimation approaches proved that there was no significant difference between the approaches, at the 5% level of significance (Table 4). Therefore, considering the non-significant differences observed between the different approaches, for the data set and the models assessed in this study, it can be observed that non-linear regression proved to be a suitable approach for adjusting the variable volume as other approaches. Consequently, taking into account the simplicity, in this study we chose to select the non-linear regression approach for the application.
The box plots in Figure 2 illustrate the residual distributions of each model evaluated through nonlinear regression. Residual dispersion presents a shorter range and a small interquartile difference, especially regarding [NLS-iii. Model] (Figure 2), which also showed mean and median values centered close to zero.  Table 5. The trend of height (h) and crown rate (cr) as a function of the diameter (dbh) can be seen through (Figure 3). The three-dimensional volume data is shown in Figure 4a. Additionally, the relation among the independent variables cr, dbh and h can also be seen in Figure 4b. The generated equation (NLS-iii. Model) is the main result of the present study, which represents accurate volumetric estimates for individual trees of Araucaria angustifolia.

Discussion
This study investigates different approaches in adjusting volume models for the species Araucaria angustifolia in uneven-aged forest in southern Brazil, introducing new possibilities for parameter optimization through genetic algorithm and simulated annealing. Valuable contribution are delivered in the scope of improving volumetric estimates for the Brazilian pine species, covering in its sample trees in a wide diametric range, by including individuals from approximately 10 cm dbh to almost 90 cm (Table 1), which characterize the formations most commonly seen in southern Brazil.
The non-significant difference observed in the present study between the regression and metaheuristic algorithm approaches (Table 4) may have been verified due to the fact that we are working with simple models and with a maximum of three variables (Expression i, ii and iii). In this context, when ill-conditioned optimization is not an issue and the definition of the starting points' values are reasonable, the simplest method could be used to estimate the volume of Parana pine in southern Brazil, obtaining a good performance with the use of nonlinear regression (see Table 3-lower values of RMSE, bias and AIC).
It is to be expected, however, that for more complex problems, these metaheuristics alternatives can become quite viable. In this sense, given the advantages they can present, we point out to the importance of additional investigations related to these approaches in the context of forest attributes estimates, as they are still very scarce.
Some other authors have based studies with the application of metaheuristic approaches in the forest context, but in particular with the approach of selecting variables for the composition of the models (feature selection). Garcia-Gutierrez et al. (2014) compare the results of two classical procedures (stepwise and best-subset) and a novel GA regression procedure for the selection of variables when multiple linear regression is applied on LiDAR data for the estimation of the main forest stand variables. The results indicated that GAs statistically outperformed the rest of the methods and the authors suggested that the parametric conditions in field data could be the main reason for the better performance of GA.  Vol. 14, No. 2; In addition, genetic algorithms and simulated annealing also have their consolidated use in the context of forest planning, at all levels: strategic, tactical, and operational (Sessions et al. 2007). For example, Dong et al. (2015) concentrated on tactical forest spatial harvest scheduling problems through the use of the simulated annealing algorithm. Similarly, Zeng et al. (2007) explored heuristic optimization in risk management of wind damage in forest planning, testing simulated annealing, genetic algorithms and tabu search. In this study, tabu search performed slightly better than simulated annealing and genetic algorithms, addressing another important optimization algorithm.
Specifically for the volumetric modeling of the Parana pine, other authors have also been trying to increase the reliability of the estimation tools for the species. Martins et al. (2017) tested the performance of artificial intelligence techniques to assess their contribution in the estimation of stem form of Araucaria angustifolia and identified that artificial neural networks provided the best estimates. Similarly, mixed nonlinear models fitting techniques were also evaluated by Costa (2014) to describe the taper of Parana pine stems, which showed flexibility and efficiency.
In the present study, the possibility of including the crown rate variable (cr) in the model proved to be of great contribution, representing gains in precision in the adjustments (Table 3, Appendices A and B), regardless of the approach considered (Table 4). Testing the contribution of crown variables to volume models is particularly relevant in a current context of forest inventory via remote imaging (Dalponte et al., 2014;Y. Li et al., 2017;Puliti et al., 2015Puliti et al., , 2020. In this perspective, the combination of laser scanning technologies and the use of Unmanned Aerial Vehicles (UAV) have increasingly advanced in the direction of individual tree crowns identification and delineation (Guerra-Hernández et al., 2019;Puliti et al., 2020), therefore, the studied relationship between volume and crown variable represents a differential of the present study and a useful tool for future studies using UAV.
Furthermore, the volume determination obtained with a non-destructive method (Criterion RD1000) also proved to be feasible, allowing the sampling of 308 A. angustifolia trees without felling. Indirect volume measurement have been used satisfactorily in other recent studies (de Oliveira et al., 2018;He et al., 2016;Marchi et al., 2020), proven to be a cheaper and desirable method. de Oliveira et al. (2018) reached non-significant differences between volumes determined by direct and indirect methods for Khaya ivorensis A. Chev. plantations in Brazil, assuring precision for indirect measurements. Rodriguez et al. (2014) also compared the volume estimations obtained with destructive methods against the volume estimation obtained with the electronic dendrometer Criterion RD1000 and the laser hypsometer TruPulse, for Pinus nigra Arn. and Pinus pinaster Ait. Mesogeensis trees. These authors highlight the need for measurements to always be taken from a distance approximately equal to the tree height, considering this as an ''accurate position''. For A. angustifolia, this alternative is especially important in the current context of cutting restrictions established by legislation, even for research purposes. In this way it was possible to take a robust sample, leading to reliable results.
Finally, we consider that the final application of this study is the delivery of robust volume equations, which covers a large data amplitude and with multiple-entry (dbh, h and cr), that could be used for accurate estimates of the volume of the Parana pine in southern Brazil.

Conclusion
The genetic algorithm and simulated annealing approaches evaluated proved to be a reliable optimization strategy for parameter estimation in Parana pine volumetric modelling.
No significant differences were found between the volumes estimated by the different approaches, indicating that the simplest method, nonlinear regression, can be used to estimate the volume of Parana pine. In addition, the inclusion of the crown rate variable demonstrated a gain in model accuracy.
This study contributes by generating robust equations that could be used for accurate estimates of the volume of Parana pine, thus contributing to species management and conservation strategies.