Knowledge Economy in Brazil: Analysis of Sectoral Concentration and Production by Region

The research presented in this article investigates and analyzes the concentration of knowledge production in Brazil, in the context of a public policy, at postgraduate level, by using the spectral methods grounded on the LQ (location quotient) and CI (concentration index) indicators, in three dimensions, from 2013 to 2018. The dimensions are economics, geography, and time. Economics is represented by Fields and Major Fields of knowledge production. Geography corresponds to the regions identified by each Federation unit (FU). Time is a chronological unit of the timeline in which knowledge is produced. The research then evaluates knowledge concentration in the income performance of the families by FU. The results are robust and indicate significant evidence that sectorial knowledge production in Brazil is regionally unequal and impacts on family incomes, but those family incomes evolve regardless of the knowledge concentration level produced. The research contributions are relevant to assist public policy regulators and monitoring managers, as well as to encourage future discoveries in regional economics applications.


Introduction
This article investigates and analyzes the concentration of knowledge produced in Brazil at postgraduate level, in the context of agglomerations, grounded on the quantitative data of master theses and doctoral dissertations, by Field and Major Field of knowledge, by federation unit (FU), within the timeline from 2013 to 2018, and investigates the contribution of knowledge concentration produced in the income performance of families.
Field is the aggregate of all knowledge produced within the geography represented by a FU. Major Field aggregates the knowledge produced by all related Fields, in the same timeline unit. Knowledge concentration is a non-negative neutral measure that, when orbiting near zero, suggests that a FU is developed. Thus, there are three variables to evaluate the concentration of the knowledge economy: FU, which is equivalent to a geographical territory unit, named Region; Field and Major Field, which correspond to the sectors in which the economy produces knowledge; and time, which is a chronological unit of the timeline in which knowledge is produced. With these variables, a local and global analysis of knowledge concentration is produced.
To investigate the degree of knowledge concentration, the article introduces the concentration index (CI), which uses spectral analysis metrics, in the context of linear algebra theory, applied to the study of agglomerations, guided by the location quotient (LQ) model in three dimensions: economics, geography and time. These metrics indicate how equal or unequal knowledge production is in terms of regional development, and how concentrated or dispersed the knowledge produced by each region is. To evaluate knowledge concentration performance in the average household income (AHI) of families and the statistical significance of this performance, the Data Envelopment Analysis (DEA) deterministic model and the quantile linear regression model are used.
In the context of the premises presented, the research investigates how proportional, concentrated and efficient the knowledge produced in Brazil is, at a postgraduate level, considering the number of master theses and doctoral dissertations, separated by Field, Major Field, region, and period, within the timeline defined from 2013 to 2018.
The actions necessary for the production and success of the research are as follows: to retrieve the quantitative data of master's thesis and doctoral dissertations defended, by FU and Field, from CAPES' thesis and dissertations database, for each time unit, and group them by Major Field; to retrieve the AHI values of IBGE statistics, by FU and time unit; calculate the LQ, the local and global percentages, and the CI, by FU, for each period of the timeline; to evaluate the fulfillment of targets defined by the regulator in market conditions (MC); and to evaluate the AHI's performance according to knowledge concentration, by FU, Field and Major Field, in each period of the timeline, as well as to calculate and analyze the statistical significance of the quantile estimators of the DEA score. MC must be arbitrated by the regulator as a reference for the effectiveness of public policy, however, public policy must be understood not only as an action of the State, but as a set of actions that seek to build a real future (Boneti, 2017).
The results produced by the research allow managers and regulators to correct possible deviations in achieving the goals during the implementation of the public policy of knowledge production so that the planned objectives are met.
The article is structure in four other sections. Section 2 addresses the evolution of the agglomeration and knowledge production binomial, highlighting the relevant contributions of the literature from the end of the 19th century to the early 21st century. Section 3 presents the evolution of the methodology and the research model. Section 4 analyzes knowledge economy in Brazil. Section 5 summarizes the main research contributions as well as the main limitations.

Agglomeration and Knowledge Production
The agglomeration and knowledge production binomial was studied by Feser, Renski, and Goldstein (2008), who evaluated subregional concentration of the research and development industry in 18 universities and other institutions in the Appalachian region, United States, as of the 1990s, with the objective of investigating its relationship with employment. They developed knowledge infrastructure indicators for the region, using the location quotient (LQ) model in two dimensions and other methods. The evidence from the study, as stated by the authors, suggests that agglomeration is associated with the creation of new businesses and not with employment growth.
In the context of knowledge as the foundation of a society's transformation, Hájková and Há jek (2014) used the DEA model to evaluate the impact of the knowledge bases of European cities in population and economic growth in the period from 2004 to 2009. The authors' evaluation indicates that the cities of the new member-states transform their knowledge bases in urban growth more efficiently than those from EU-15, due to their low initial level of creation and systematic knowledge transfer, because of the economic convergence and internal knowledge transfer, which were supported by sales.
Knowledge, as a grouping engine of the industry, promotes the appearance of urban agglomerations, an increase in income levels and, consequently, in social welfare, as addressed by Marshal (1890;IV,X,3), Gabe and Abel (2010). In this context, knowledge boosts the development of a region and/or a nation, as it builds the capacity and the skills of human capital (Romer, 1996) to increase and improve productivity, whether by the proper use of technology or by the rational use of natural resources, or even by the improvement of processes, as addressed by Quatraro (2010), Cooke et al. (1997 and Antonelli (2008). It may even be by interacting with and impacting other sectors of the economy, as suggested by the findings of Zabala-Iturriagagoitia, Voigt, Gutié rrez-Gracia, and Jimé nez-Sá ez (2007) in a similar study in the European Union.
In Brazil, the knowledge production industry, segregated by Major Field and Field, is spread out in the five macro-regions or geopolitical regions of interest of the education policy. This distribution gains emphasis in the Major Field and Field according to the direction of the local economic activity, where a region stands out in comparison with the others, creating a regional (local) economic characteristic. Examples of such characteristic include the prominence of agribusiness in the Central-West region; of tourism in the Northeast region; and of technology and mining in the Southeast and South regions.

Model Specifications
This section retrieves from the literature the development of the agglomeration measurement model and defines the research model.

Location Quotient (LQ)
LQ is a social indicator traditionally used in the literature of regional economics to measure the spatial effects of agglomerations. Seminal studies such as the one attributed to Haig (1920) and the one produced by Florence (1929) are evidence of that use. However, these studies present the LQ as being only two-dimensional, represented by economics and geography, which is a limitation to the use of this indicator in a matrix application.
Later studies, such as those of Alexander (1954), Tiebout (1962, p. 9), Richardson (1985), Stimson, Stough, and Roberts (2006, p. 107), Gabe and Abel (2010), and Thulin (2014), have also measured the effects of each sector in the spatial activity of the economy through the LQ in two dimensions, but other studies, such as those by North (1955) and Gilmer and Keil (1989), criticize the effectiveness of the results obtained through the analytical LQ model with the seminal specification.
The seminal conception of the two-dimensional LQ model is one-off and specific. Thus these two dimensions are the restriction that limits the dynamic application of the model with a matrix-based use. The model attributed to Haig (1920)  and it considers A to be a location with a specified industry; C to be a location that gathers the entire specified industry; B to be all the locations in which there are specified industries; and D to be all locations of all specified industries.
On the same line of the seminal concept of the two-dimensional LQ, Stimson, Stough & Roberts (2006, p. 107) also rewrote the model as follows: and they consider to be employment in sector i in region r; to be total employment in region r; to be employment in sector i in the reference area (N = national reference); and to be total employment in the area of national reference.

Research Model
Realizing the limitation imposed upon the LQ in only two dimensions, De Franç a (2020) and and De Franç a and Sosa (2020) proposed a matrix-based approach. This approach consists in adding a third variable: time. With the introduction of a new three-dimensional LQ, the context of theory of linear algebra and spectral analysis is also introduced. With the three dimensions, the specification of the LQ removes the restriction present since the seminal studies and constitutes a consistent non-parametric social indicator that signals proportional equalities/inequalities between regions, as well as indicating a concentration/dispersal in regional development, within a timeline, with vast application in the economy. The model is conceived based on a set of equations, as follows:

1) General three-dimensional LQ model and introduction of economic variable V
(1) the specification considers V to be the variable that brings together the three dimensions of the economy; k and j are the sectors of the economy; i and are the regions of geography; t is the unit of the timeline.
If the production by FU is proportionally equivalent, one expects ( , , ) ≡ .

2) Aggregation of the economy according to the V economic variable
a) Aggregation of the production by k sector in each t unit of time The GP and the LP allow Regulators and Public Policy Managers to follow up and monitor the fulfillment of the target defined in the market condition (MC), within the timeline for which the target was established.

a) LQ restrictions according to V based on GP and LP
The LQ calculated by the model of Equation 1 is rewritten by Equation 4. This is necessary to show that, when the mathematical limitation of the division by zero takes place, the LQ is defined as zero. This occurs when a sector does not produce in any region.

b) LQ (LQRM) and Covariance (COVM) matrices
The elements of each LQRM are defined based on the LQ indexes in the three dimensions. A sized SxP is defined for each region i and period p and, from each of those matrices, a sized PxP is defined. For both matrices, S represents the sectors of the economy (Field and Major Field), and P represents the units of time of each p period, which is defined by P intervals of units of time. Thus, Equation 5 defines the elements of each and Equation 6 defines the elements of each .

c) Eigenvalues () and CI
ijef.ccsenet.org International Journal of Economics and Finance Vol. 13, No.11;2021 Each CI is defined by a vector of eigenvalues ( ) from a . The CI is a real neutral number, sized Rx(T-P+1), which evaluates the consistency of regional development aiming at the equitable offer of opportunities, calculated through Equations 7 and 8.
If the economy is concentrated, ( , ) ≡ 0 and the opportunities are distributed.

4) The efficiency of knowledge concentration performance
The DEA performance scores are evaluated by the production function (Equation 9) as per models CCR and BCC introduced by Charnes, Cooper and Rhodes (1978) and Banker, Charnes and Cooper (1984).

5) The statistical significance of knowledge concentration performance
The quantile model is recommended by the literature because it enables the evaluation of the consistency of the coefficients in a dynamic manner, in the timeline, in quartiles and percentiles, and also because it dispenses with some of the properties demanded by the traditional models, such as homoscedasticity, as discussed by Li (2015) and Hinostroza (2017), and as done by Henriques (2019).
One of the model specifications is presented by Koenker (2005, pp. 123-125), which defines the rate of convergence as and argues, in a contextualized manner, that, in order to explore the asymptotic behavior of the regression classification process, one must consider the linear model in scale of location, defined by Arising from the iid error model In which, for the application of the research: β is the asymptotic estimator; Y is the DEA performance score; X is the CI of the eigenvalues vector; F is the conditional distribution function.
The purpose of introducing this model into the research is to evaluate the signal (direction) and the statistical significance of the coefficients. ijef.ccsenet.org International Journal of Economics and Finance Vol. 13, No.11;2021 4

. An Analysis of Knowledge Economy in Brazil
The primary data that make up the sample gather 471,308 master's theses and doctoral dissertations defenses in the 81 Fields of the 9 Major Fields of knowledge, from 2013 to 2018. The programs approved for those fields totaled 4,654 specialties at the master and doctorate level, both academic and professional, offered by 6,877 courses in the 27 FUs (Table A4) in the geography. On average, more than 76,300 documents for master's theses and doctoral dissertations were produced per year in the last five years. The choice for the production of knowledge, at a graduate level (master's and doctorate), is adequate due to the coverage of the entire national territory, the reliability of the data, and the organization of the document production system by CAPES (Coordination for the Improvement of Higher Education Personnel) and by CNPq (National Council for Scientific and Technological Development), the regulatory agencies for knowledge production in Brazil. CAPES and CNPq were both created in 1951 with the mission of regulating knowledge production at a master's and PhD level and foster scientific and technological production, respectively.

Dimensions of Knowledge Economy
The

Analysis of the Descriptive Statistics of Knowledge Production
The statistical estimators of the primary distribution data, totaled by FU and level of knowledge, are shown in Table A1. The estimators reveal two relevant characteristics: one is the proximity and position change between mean and median, where the mean is sometimes above and sometimes below the mean. The other is the reduced dispersion measured by the Coefficient of Variation (CV). At the master's degree level, the two largest dispersions orbit around ½ standard deviation of the mean (0.442 and 0.542), attributed to the FUs AP and AC, and the two smallest ones (0.048 and 0.053) are attributed to FUs MS and RJ. These magnitudes of the CV suggest a sector concentration of the production of knowledge around the regional mean. At the doctorate level, the elasticity of the dispersion is higher, with 2 FUs, AC and RR, showing a CV around 2 standard deviations from the median (2.082 and 1.922), while the lowest two, SP and DF, disperse with 0.099 and 0.118. The most elastic amplitude of the CV reflects an insufficiency of regional production, such as in FUs AC and RR, with high inequality. Table A2 shows the estimators at the master's and doctorate levels. One characteristic of this distribution consists in the positioning of the estimators of the mean at the upper half of the median throughout the entire timeline. Another characteristic is the opposite direction of the dispersion (CV), which increases at the master's level and decreases at the doctorate level. The first characteristic produces an effect of asymmetry between the FUs, and the second indicates an increase in inequality at the master's level and a reduction of inequality at the doctorate level.

Analysis of the Dynamics of the Global Percentages by Major Field (GPMF)
The global percentages shown in Table 1 were calculated according to the model defined in equation (2) and represent the execution of the public policy of knowledge production in Brazil. The quantum of each GPMF, in each column, is the participation of the MF in the global knowledge economy. In this context, it can be observed that MF6 (applied social sciences) leads with the highest production of knowledge in five of the six units of time, at the master's level, and, at the doctorate level, throughout the entire timeline, while the lowest production of knowledge is that of MF2 (biological sciences) in five of the six units of time, at the master's level, and, at the doctorate level, it is that of MF8 (language studies and arts) throughout the entire timeline.
These results supply robust evidence that biological sciences, MF2, (studies of the origin of life), at the master's level, shows a low priority, with a participation ranging between 5.08% and 7.15% of all knowledge produced, although it places higher at the doctorate level, between 8.75% and 10.5%, being, however, less than half of the most productive MF (applied social sciences). To evaluate the fulfillment of the target set by the regulator, we compare the MC to the percentages of the execution of each MF, but the aforementioned MF is not observed. In this condition, indirectly, the comparison is made by observing the CI quanta of each MF, which is developed in the subsection.

Analysis of the CI Matrices of the Execution of the Public Policy of Knowledge Production
The IC matrices represented in Table 2 were calculated according to the model specified in Equation (8). Each line corresponds to a FU associated to the knowledge produced in a Field (F) and to the total knowledge produced in all of the Fields (MF), at the master's and doctorate levels, in each p period, in the T timeline from 2013 to 2018. The total of p periods dimensioned by the P intervals of 4 units of time is determined thus: p=T-P+1=3.
Through the metrics of the spectral model, an LQ index approaching 1 implies a CI quantum approaching zero. The left portion of show a CI lower than 1 in all CIMMF and CIDMF periods. In Knowledge concentration (F), the CI quanta are all higher than 1. This suggests that the view of the economy regulator in Brazil is focused on knowledge production by Major Field, at the expense of the Field.
Thus, out of the findings of the research, two are significantly relevant: (1) in knowledge by MF, the reality of 6 of the 27 FUs (AC, AP, RN, RO, RR, TO) is very different from that of the others in the production of knowledge, indicating a high level of regional inequality; (2) in knowledge by F, FUs of the south and southeast regions (MG, RJ, PR, SC, SP) show less inequality, despite being far from the convergence.
Considering that the Market Condition (MC) is not observed, it is possible to safely infer that the regulator's target, if there is one, was not met, because such a fulfillment would necessarily imply a CI quantum close to zero, a situation that is only present in UF RS. Under these circumstances, there is no evidence of monitoring of the implementation of the public policy for knowledge production, or, if there was such a monitoring, it was not effective.

Analysis of the Performance of Knowledge Concentration
The performance status of the concentration of knowledge in the improvement of family income is analyzed by the DEA score, in a first stage. This entails the use of the production function defined by Equation (9). The status defines the classification of performance as fully efficient, partially efficient, and approaching the efficiency frontier, as shown in Table 3.
Full efficiency is met when the FU/DMU is benchmark with all scores equal to 1; in this article, the total of scores is equal to 12. Partial efficiency happens when the FU/DMU obtains at least one score =1 among all possible. An approach to the efficiency frontier is shown when the FU/DMU does not satisfy the first two conditions, but has at least one DEA score above the 0.75 cutoff defined in this research. The criterion for computing the quantum of the score in each FU/DMU considers the whole of the dimensions Field or Major Field, period, level of knowledge, and the fixed and variable effects, input and output, CCR and BCC, attributes of the DEA model. In this criterion, only 16 of the 27 FU/DMU obtained a score higher/equal to the cutoff.

1) Status of classification of performance efficiency
The performance efficiency results were obtained with the use of the open-source Gretl statistical package. Full efficiency was only achieved by FU RS, which scored 1 in the 12 possible scores, indicating that knowledge concentration improved the regional family income, as highlighted in the first line of Table 3.
The status of partial efficiency was achieved by 8 of the 27 FU/DMU (AL, BA, CE, MA, MG, PA, PB, SP), which scored between 2 and 8 out of the 12 possible scores. This score shows that the FUs with the best performance are SP and MG, followed by AL and MA, and, with the worst performance, CE and PA.
Approaching the Efficiency Frontier are 7 of the 27 UF/DMU (AM, PI, AC, SE, PE, PR, RN), which obtained from 2 to 5 scores at least equal to the cutoff. This score suggests that a high CI contributes little to the improvement off family income.
Thus, the model's responses for the three efficient performance statuses suggest that the optimal knowledge concentration impacts the AHI. However, the data from Table A4 reveal that the highest AHI, attributed to FU FD, did not even score the minimum of the cutoff. In light of this robust evidence, the inference is that, while a CI approaching zero improves family income, this income can be improved regardless of the optimal status of knowledge production, as in the findings of Zabala-Iturriagagoitia, Voigt, Gutié rrez-Gracia, and Jimé nez-Sá ez ijef.ccsenet.org International Journal of Economics and Finance Vol. 13, No.11;2021(2007; Susiluoto (2003);Cooke, Uranga, and Etxebarria (1997); Antonelli (2008), and Quatraro (2010), who did not find evidence that a higher income is related to efficiency.
In spite of the robust evidence presented, it is necessary to consider a relevant restriction to the model in a first stage. The CI and the AHI are associated to a same FU in the timeline. However, due to social mobility, knowledge produced in one FU may impact income in another FU. If this mobility takes place, performance converted into efficiency may present an upward or downward bias and impact the classification status.

Analysis of the Statistical Significance of the Concentration of Knowledge
Knowing the impact of the degree of knowledge concentration on family income, which defined the efficiency status, what is under analysis now is the statistical significance of this concentration and the direction of the coefficients, in a second stage, through the quantile model of linear regression, as defined in 3.2(4), using the DEA score and the degree of knowledge concentration measured by the CI. This model was used by Brown and Scott (2012) to explain the migration of human capital and agglomerations in Canada.
For each of the 16 FUs with performances placed between full efficiency and approaching the efficiency frontier (Table A3), 24 regressions were conducted (three by period/Field and three by period/Major Field, in percentiles 0.25; 0.50; 0.75; and 0.99) and in all coefficients the direction is positive (+). The choice was made to use only the CI of the knowledge produced at the master's level (CIM) because the production base is larger, with a broader coverage of observations. by Field and Major Field of knowledge production within the regional geography.
Market conditions with the metrics defined by the regulator for the fulfillment of goals are not observable. With the absence of these metrics, one infers the achievement of the goal with a CI near zero, which defines the optimal status of a region's development. In the geography of all 27 FUs, only FU RS indicates the fulfillment of the goal with a CI bearing near zero for the knowledge produced by Major Field. For the knowledge produced by Field, all FUs present divergence from the fulfillment of the goal. The Major Fields with highest level of quantitative participation in total knowledge production are Social Sciences and Human Sciences, while Biological Sciences and Multidisciplinary present the lowest levels of participation.
Efficiency of knowledge concentration performance, in the improvement of family income, was only observed in FU RS with a DEA score equal to 1 in all time periods. Only 16 out of the 27 FUs had a performance score between 0.75 and 1. The statistical significance of these performance scores is consistent with 10%, 5%, and 1%.
The evidence is robust when indicating that knowledge production in Brazil is not regionally developed, and the absence of a market condition with targets setting does not allow for the fulfillment of the market regulator's objectives. To mitigate this gap, the present research referred to the spectral methodology grounded on LQ regional matrices (LQRM), covariance matrices (COVM), whose characteristic polynomial eigenvalues () generate the CI calculation, which indicated that knowledge concentration improves family income. Nevertheless, family income evolves regardless because social mobility may interfere in the region's income. However, this hypothesis has not been assessed.