Artificial Neural Network and Multivariate Models Applied to Morphological Traits and Seeds of Common Beans Genotypes

The aimed to characterize common beans genotypes utilizing multivariate models and artificial neural network thru the agronomic attributes and seeds dimensions. The experiment was conducted in the 2017/2018 crop season at the city of Tenente Portela RS. The experimental design was expanded blocs, were 53 segregating F2 populations and ten cultivars considered checks, disposed in four repetitions. The accurate characterization of bean genotypes can be based in the reproductive period, cycle and seeds length. Genotypes with longer cycle increase the potential of ramifications, legume and seeds magnitude per plant and increase the seeds yield independent of the commercial group. The use of biometric approach allows revealing patterns to the genotype grouping, being the patterns magnitude dependent of the intrinsic premises to the Standardized Average Euclidian Distance, Tocher optimized grouping and Artificial Neural Network with non-supervised learning. It is defined that the Artificial Neural Network are determinant to define associative patterns, being these inferences indispensable to the common beans genotype selection that answer the agronomic attributes and seeds production.


Introduction
The common bean (Phaseolus vulgaris L.) is characterized as a leguminous species belonging to the Fabaceae family.Its importance is directed to human feeding and stands out as a major crop in the Brazilian food base, being cultivated in most regions of Brazil (Moura et al., 2013;Pedó et al., 2016).It is submitted to several technological levels from subsistence agriculture to proprieties with long extended area proprieties (Carvalho et al., 2016).Its cultivation results in social, economic and agricultural peculiarities, since it is a low cost source of protein, presents short cycle, and with adequate management, it can be used in seasonal periods in the same summer time of the crop season.It results in a sustainable system, in the other hand, there are necessities on the nutritional and hydric management as much for the seed production as for the grains (Farinelli & Lemos, 2010;Demari et al., 2015).
Many are the factors that can influence the characterization and selection of the most agronomic adequate genotypes, in order to increment the seeds productivity per plant and per area unit.Among these factors, the soil and climatic characteristics of the growing environment, crop season and sowing periods, management and technologies used, morphological and physiological plants characteristics, growth habit, commercial group and genotypes intrinsic genetic constitution (Torres et al., 2015;Troyjack et al., 2017).In this context, due to the varied growing conditions in which the common beans are submitted and to the large genetic variability available to this species, it is necessary to characterize the germplasm to the genetic similarity and dissimilarity with the commercial cultivars, based on the principal agronomic attributes of crops and seeds.In this way, it may be possible to select superior genotypes that attend the crop ideotype desired by the farmer.
With the object to reduce financial costs and expedite the selections to develop new common beans genotypes, an efficient alternative is to employ different biometric approaches that allow estimating and classifying genetic patterns utilizing phenotypic measurements (Carbonell et al., 2007;Szareski et al., 2017).In this manner, seeks out to conciliate methods that reveal the similarity of the germplasm origin on the geographic distinctions, as well, its tendency of association conjugating the genetic dissimilarity and the development of Artificial Neural Network thru the computing learning.In this context, this work aimed to characterize common beans genotypes utilizing multivariate models and artificial neural network thru the agronomic attributes and seeds dimensions.

Material and Methods
The experiment was conducted in the 2017/2018 crop season at the city of Tenente Portela-RS, located at latitude 27°23′31.04″S and longitude 53°46′50.71″W, with an altitude of 420 meters.The climate is humid subtropical type Cfa according to Köppen classification, and the soil is characterized as typicalumino-ferric Red Latossol (STRECK et al., 2008).The used common beans genotypes were classified according to its germplasm geographic origin (Campos Borges-RS, Palmeira das Missões-RS, Santa Rosa-RS, Pejuçara-RS, Braga-RS, Cruz Alta-RS, Santo Antônio do Goiás-GO, São Paulo-SP, Uberlândia-MG and Londrina-PR), the commercial group characteristics (Roxo, Rajado, Preto and Carioca) and thru the genetic base (segregating populations and cultivars), the detailed information are exposed on Table 1.The experimental units were composed of five sowing lines with length of five meters each, spaced by 0.45 meters.The used sowing density was 22 seeds per square meter (m²).The sowing was realized in the second fortnight of November, based on the no-till sowing system.For the fertilizing, it was used 250 kg ha -1 of N, P 2 O 5 , K 2 O in the formulation of (10-20-20) and broadcasted 90 kg ha -1 of nitrogen from urea source (46% of N), in the V4 growth stage.The preventive practices were prioritized, in order to minimize the weed, pests and diseases effects that could have influenced the experiment results.The agronomic characters were measured with 10 plants randomly harvested in the area (4.05 m²) of the experimental unit, being: -Duration of vegetative period (PVE): measured thru the counting of numberof days from the sowing to the complete flowering, results in units.
-Duration of reproductive period (PRE): obtained thru the number of days between complete flowering and harvest, results in units.
-Cylce duration (CIC): valued thru the period between sowing and harvest, results in units.
-Crop height at flowering (ALF): measured the extension from the soil level to the plant top, results expressed in centimeters (cm).
-Crop height at harvesting (ALM): obtained at the harvest time by the measurement of the extension from the soil level to the plant top, results in cm.
-First legume insertion height (IPL): measured the extension between the soil level and the first crop viable legume, results expressed in cm.
-Number of legumes per plant (NPL): valued the magnitude of viable legumes per plant, resulting in units.
-Number of seeds per legume (NSL): obtained thru the reason between mass of seeds per plant and total legumes number, results in grams (g).
-Number of ramifications per plant (NR): obtained thru the counting of ramifications with more than ten centimeters of extension and legumes presence, results in units.
-Seeds mass per plant (MSP): after trashing the plants, the seeds were cleaned and submitted to the measurement of seed mass with a precision scale, results in grams.
-Seeds length (COS) and width (LAS): it were measured 100 seeds per experimental units, using a digital caliper ruler, results expressed in millimeters (mm).
The geographic coordinates were used to define the exact location of the germplasm origin environments, these physic distances were utilized in the analysis of main coordinates with the goal of graphically express the environments dispersion thru the geographic distances matrix.Afterwards, the phonotypical data were submitted to the normality and homogeneity of variances, aiming to identify the discrepant observations.Then, it was submitted to analysis of standardized Average Euclidian distance and the matrix construction of the genotypes genetic dissimilarity, this matrix was applied to the grouping method Unweighted Pair Group Method with Arithmetic Mean (UPGMA) where the dendogram was constructed.Later, was realized the Tocher optimized grouping method prioritizing the homogeneity inside the group and the heterogeneity between groups.In addition, it was realized the relative character contribution by the Singh method (1981) with the aim of understanding which characters were determinant to the distinction of the tested genotypes.
With the objective of understanding the linear associations between characters and create a casual diagram, it was realized the linear correlations, where the coefficients significance level was obtained by the t test with 5% probability.Aiming to identify the patterns for the selection and determine which measured characters associated to each determined pattern, it was used the Artificial Neural Network (RNAs) basing its estimates on the non-supervised computational learning, the associated centroids and neurons topologic definition were obtained using the method of Kohonen Mapping, using 200 thousand interactions.The statistical analysis were realized using the softwares Genes (Cruz, 2009) and R program (R Core Team, 2015).

Results and Discussion
Due to the variability of the germplasm origin used in genotype differentiation studies, it is crucial to employ biometric models that allow to evidence this variation in a comprehensive way.In this manner, the use of main coordinates it is based on the independency between the abscissas axis, that support the schematic representation of the non-linear associations, and its estimates are based in the similarity of the used observations.In this context, it was used the geographic localization of the studied genotypes origin (F2 populations and cultivars) to the construction of the distances matrix.After, the auto values that defined the relative scores (X and Y), to each origin, were estimated.
The main coordinates graphic (Figure 1) represents the variation of nine distinct origins of the common bean genotypes seeds.Where it is found high similarities among the origins of Cruz Alta (CA), Palmeira das Missões-RS (PM) and Pejuçara-RS (PJ), even by being located in the same state as Campos Borges-RS (CB) and Santa Rosa-RS (SR), they diverge from this group, because they are not contrasting environments and are geographically distant.When it refers to the more discrepant origins, this situation is attributed to check cultivars that are originated from breeding programs located in the South, Southeast and Middle West.It is noted that when situated at the same quadrant São Paulo-SP (SP) and Uberlândia-MG (UB) presented different about its location, having a similar behavior as the ones obtained from Londrina-PR (LO) and Santo Antônio de Goiás-GO (SA).Even though comprehending the differentiations of the germplasm origin, the commercial group and intrinsic genetic base of the studied genotype, many of them presents phenotypically similar, resulting in the necessity to differentiate them efficiently.As follows, some statistic alternatives were developed to support this differentiation, where, first it is established which characters are essential to the differentiation.The characters relative contribution is an efficient method in order to establish which measured character presents relevance (Figure 2).In the circumstance, higher polymorphism between the genotypes was expressed in the seeds length (14.5%) with 8.5 to 14.1 mm amplitude, reproductive period duration (13.3%) comprehending from 38 to 74 days and cycle duration (13.3%) between 68 to 116 days.It was established that these were the more contrasting characters among the genotypes, and so, must be considered to differentiate the genotypes classified as similar thru its origin, commercial group or genetic base.
After learned which characters that had more relevance were, it is indispensable to define which of these presents linearly associated.That is because it is clear that the main components of common beans yield are defined thru the joint association of several characters and these combinations, many times, presents inverted directions.In this manner, it was realized the linear correlations analysis with the goal of establish the tendency of the measured characters and develop a casual diagram using the associations with 5% probability significance by the t test.It were correlated 13 characters measured in the 63 common bean genotypes, where 78 linear associations were established with only 10 significant correlation coefficients (Figure 3) and defined as efficient to develop the casual diagram.The Artificial Neural Network (RNAs) presents indispensable to define mathematic patterns thru the non-linear stochastic phenomenon.In this manner, the definition of the network topology is defined by n non-supervisedinteractive computational process, where the magnitude of the network entries correspond to the neurons number associated to the model explicability.These neurons when related allow the definition of a centroid that correspond to the medium point among the existing associations (Nascimento et al., 2013;Teodoro et al., 2015).For this study, it was prioritized the utilization of the Kohonen Mapping method using 63 entries (neurons) with more than 200 thousand interactions (Figure 5).The phenotypic matrix was submitted to interactive proceedings that defined a neural network with topology of 20 centroids that establish associative patterns among the tested genotypes (Figure 5), being necessary 31 synaptic connections to interconnect the centroids.
The expressed tendencies in this study presented essential to identify which biometric approach is more adequate to the genotypes characterization thru the phenotypic measurements.It was defined that the common bean seeds length is the most contrasting character to differentiate the genetic constitutions, as well, the seeds productivity is directly related to genotypes with higher duration of reproductive period and cycle, because this attributes increase the yield compounds.The multivariate approaches based in the genetic distance estimate and define different genotypes patterns where the dendogram allows expressing 14 groups, the Tocher optimized method suggests 10 subdivisions, but the used of Artificial Neural Networks with non-supervised learning that better stratified the genotypes basing its inferences in estimates of 20 centroids established by 63 neurons and 31 synaptic associations.

Conclusions
The accurate characterization of bean genotypes can be based in the reproductive period, cycle and seeds length.Genotypes with longer cycle increase the potential of ramifications, legume and seeds magnitude per plant and increase the seeds yield independent of the commercial group.The use of biometric approach allows revealing patterns to the genotype grouping, being the patterns magnitude dependent of the intrinsic premises to the Standardized Average Euclidian Distance, Tocher optimized grouping and Artificial Neural Network with non-supervised learning.It is defined that the Artificial Neural Network are determinant to define associative patterns, being these inferences indispensable to the common beans genotype selection that answer the agronomic attributes and seeds production.

Figure
Figure

Table 2 .
characters, its interferences are based in the groups definition where internally it is prioritized the homogeneity and between groups the heterogeneity(Santo et al., 2014).Using this method allowed the fragmentation of 63 genotypes in 10 groups, being that in the first (I) 73% of the genotypes were concentrated, where 11% are commercial cultivars.It is established that in this groups are united those genotypes with agronomic characteristics most close to the prioritized ones (Table2), in this conditions PJP38 and BRS MG Realce, characterizing it as belonging to the rajado commercial group.The remaining groups (III, IV, V, VI, VII, VIII, IX, and X) represents 13.1% of genotypes and its distinctions are based in small amplitudes in the average obtained results, and if there is interest in selecting specific characteristics to the next generations this groups can be prioritized.Optimized grouping method of Tocher to define groups thru the genotypes dissimilarities