Predicting the Evolution of Literature on Industrial Symbiosis Using Topic Modeling

The aim of this literature review is to suggest future research on industrial symbiosis. This study is based on the analysis of the latent topics related with the industrial symbiosis found in a collection of 611 scientific articles retrieved from Web of Science on 2022/3/11. Our research process applies the topic modelling technique, which allows the grouping of documents that discuss similar topics, even if these topics are not explicitly mentioned in the documents themselves The initial classification of the existing body of works made it possible to identify current trends and developments in the field and the most promising lines of future research. Specifically, we identified 12 themes, which we characterised with a short sentence rather than a single label. Among the 12 themes, we have highlighted those that are attracting increasing attention from scholars, while others are declining or remaining stable over time


Introduction
In this increasingly resource-constrained and environmentally conscious world, the need for industrial symbiosis (IS) is more important than ever. Since the introduction of the concept of industrial symbiosis by Christensen (1992), manager of the Kalundborg eco-industrial park in Denmark, IS has been the subject of several definitions. One of the most significant for the literature is Chertow (2000): "Industrial symbiosis involves traditionally separate industries in a collective approach to competitive advantage that involves the physical exchange of materials, energy, water and/or by-products. The keys to industrial symbiosis are collaboration and the synergistic opportunities offered by geographic proximity." This definition clarifies the goal of IS as a process which creates a more closed-loop system where waste, by-products, energy from one company becomes a resource for another.
The main implementations of IS are in eco-industrial parks (EIPs), which are industrial areas designed and managed to be environmentally friendly and sustainable. As in the case of IS, there are many definitions of EIPs. Among these is Lowe's (1997), which describes them as "a community of manufacturing and service companies seeking improved environmental and economic performance through cooperation in managing environmental and resource issues, including energy, water, and materials. By working together, the business community seeks a collective benefit that is greater than the sum of the individual benefits that each company would achieve by optimising its individual performance alone." and successful strategy. One of the main challenges is the lack of awareness and understanding of IS among businesses. IS is still a new concept and there is not enough information available about it. Another challenge is the lack of incentives for businesses to participate in IS. The benefits of IS are often long-term and may not be immediately apparent to businesses. As a result, there is a need for more research into the economic benefits of IS to raise awareness and understanding of its potential to create value.
As symbiotic initiatives have developed in many countries around the world, promoted by individual public policy makers or self-organised private enterprises, an extensive academic literature has also developed analysing the many issues and opportunities presented by these initiatives. This literature ranges from the purely technical aspects of coordinating flows (of by-products, waste, energy) to those of the economic and managerial context.
The aim of this paper is to predict the evolution of IS literature, based on a preliminary systematisation of the existing body of knowledge, using the topics discovered by Topic Modeling. Topic modelling allows documents that discuss related topics to be grouped together, even if these topics are not explicitly mentioned in the documents themselves. This is an improvement over existing methods of literature analysis [narrative, systematic, bibliometric] because it is based on soft clustering, also known as probabilistic clustering, which assigns a probability (or degree of membership) to each observed topic in each cluster, whereas hard clustering assigns a data point to only one cluster.
Although there are literature reviews on IS (Chertow, Kanaoka and Park,2021), most part of them use a bibliometric approach (in the appendix of such an article, to which we refer those interested, some previous reviews are recalled). Our study contributes the the literature on the topic and on the literature review methodological approach applying for the first time Topic Modelling. To the best of the authors' current knowledge, reviews of the IS literature have not been conducted using such a methodology. Additionally, these methodology and tools, does not aim at a simple survey of the state of knowledge on the topic based on previously published research [(the third purpose of literature reviews according to Baumeister and Leary (1997)], but rather as a methodology for answering the following research questions: RQ1: What are the latent themes in the scientific literature on industrial symbiosis?
RQ2: In what proportions are these topics present in the existing literature? RQ3: Which topics will be more popular among scholars in the future? This paper is structured as follows: after the introduction, in which the motivations for the study and the research questions are clarified, the paper recalls the main historical roots of the scientific literature on industrial symbiosis in section 2. The dataset used for the analysis, as well as some remarks on topic modelling and the tools used, are presented in section 3. Section 4 presents and justifies the values of the parameters used in the analysis. The results of the study are the subject of the following section, which is divided into a summary in which the results of the analysis are presented in tabular and graphical form, and a separate sub-section dedicated to the complex issue of labelling and characterising the topics found. The study concludes with a discussion of the results, in which the authors present their remarks on the distribution of scientific attention to the several topics, its evolution over time and a forecast of which topics will be studied more in the future. Concluding remarks and limitations of the study are presented in section 7.

Roots of Scientific Literature
Most scientific contributions on IS refer to the work of Frosch and Gallopoulos (1989) "Strategies for manufacturing", in which the authors state as incipit: "Wastes from one industrial process can serve as raw materials for another, thereby reducing the impact of industry on the environment". Of course, attention to operational exchanges between firms goes back much further: without going back to the classics of economic and environmental literature (Simmonds, 1862) (Koller & Stocks, 1918, for the historical analysis of which we refer to the works of Desrochers (2002), Desrochers (2004, Desrochers and Sautet (2008), the fact that firms are attracted to locations where they can minimise costs and maximise profits by exploiting nearby available resources and residues of other firms is already present in Renner (1947). Renner's work remains highly influential in the fields of economic geography and economics with a high citation index. After the extension of the concept of symbiosis to industrial production by the aforementioned Christensen (1992), studies multiplied for the commitment of Chertow (2000Chertow ( , 2007, still among the leading scholars of IS, with Mirata (2004), Laybourn and Lombardi (2007), Ashton (2008), among others, who prepared the intellectual ground for the subsequent literature. In particular, Chertow's (2000) taxonomy -which identifies five types of IS, namely "through the exchange of waste", "within a facility, firm or organisation", "between firms colocated in a defined eco-industrial park", "between local firms not collocated", "between firms organised virtually in a wider region" -still constitutes a point of reference for studies on IS from any disciplinary perspective. Chertow's studies then turn to the genesis of IS (Chertow, 2007) and lead the researcher to conclude that "both the discovery of existing symbioses and the attempt to design and build eco-industrial parks involving physical exchange have not led to as much sustainable industrial development as has been achieved through efforts to study and replicate what has largely self-organised in Kalundborg, Denmark". The article makes recommendations to encourage the identification and discovery of existing 'kernels' of symbiosis, as well as policies and practices to identify early-stage precursors of potentially larger symbioses. Mirata (2004) turns his attention to IS programmes covering large geographical areas and discusses the factors that influence the development and sustainability of regional industrial symbiosis networks. In particular, the author discusses the role of a coordinating body in modifying these factors to catalyse the development of IS networks, looking at the experiences of three regional IS programmes in the UK and the National IS Programme (NISP). Differences between the cases studied lead the author to conclude that the nature of the companies' operations and industrial history in the regions, the level of peer pressure, the positioning of the coordinating body in the region, and the approach to awareness raising and recruitment have a major influence on the progress of IS programmes. Laybourne and Lombardi (2007) examine the UK NISP, the first national IS programme, highlighting how the NISP aims to bring together companies and organisations of all sectors and sizes to achieve significant environmental benefits, such as greenhouse gas reductions and landfill diversion, as well as cost reductions and new sales. Ashton (2008) uses social network analysis (SNA) to examine the prevalence of invasive species in Barceloneta, Puerto Rico. The study found that IS is not as common as product sales, but is more common among pharmaceutical companies at the core of the regional network. The author concludes that social network analysis is a useful tool for examining different relationships in an industrial ecosystem.

Materials
The database used for the analysis (hereafter referred to as the "corpus" or "complete collection") is composed of the abstracts of 611 articles published in peer-reviewed journals, retrieved from Web of Science (WoS) on 11 March 2002, with a query limited to articles containing "industrial symbiosis" in the title or author keywords. The choice of Web of Science is based on the authority of the bibliographic database and the completeness and manageability of the metadata it provides. The restrictions applied in the research are in line with the aim of having a database that focused on the subject, thus avoiding the inclusion of papers whose abstracts containing "industrial symbiosis" only occasionally refer to the subject studied. For the purposes of reproducibility and verification and/or updating of this work, the search is reported in full below.
"industrial symbiosis" (title) or "industrial symbiosis" (author keywords) and conference paper or review article or editorial material or early access or book chapter or correction or meeting abstract or retracted publication or news item or book (exclude -document types) and English (languages).
In terms of words and n-grams (hereafter referred to as "terms") to be analysed, the sample counts 2 842 unique terms, of which the first 180 have a cumulative frequency of 59.05% (the total number of terms is 35 706, of which the first 180 account for 21 084). In the Appendix, the database is described in detail in terms of WoS research areas, publication titles, author affiliations, authors, publication years (Table A1).

Methodology
Topic modelling is a statistical technique for discovering the latent structure in a collection of documents. The technique can capture word correlations in a collection of textual documents with a low-dimensional set of multinomial distributions called ''topics'' (Cao, Xia, Li, Zhang, & Tang, 2009). Blei, Ng, and Jordan (2003) and Blei (2012) present a probabilistic approach to Topic Modeling (Latent Dirichlet Allocation), which has become the de facto standard, where each document is considered as a mixture of topics, and each topic is characterised by a distribution over words.
The usefulness of topic modelling is widely recognised, and the technique has been applied in many different domains, apart from computer science, such as marketing (Amado, Cortez, Rita, & Moro, 2018), technology forecasting (Momeni & Rost, 2016), hospitality (Park, Chae, & Kwon, 2018), and others. As an unsupervised machine learning method that can automatically analyse and cluster documents based on their content, it can group documents that discuss related topics, even if these topics are not explicitly mentioned in the documents themselves. reviews, as it can reduce the time needed to read and identify relevant papers. In this context, topic modelling can be used to identify and track emerging trends and topics of interest, to assess the relationships between different concepts and ideas, to identify gaps or potentially overlooked papers, and to monitor changes in the field over time. Hannigan et al. (2019) provide an overview of how Topic Modelling can be used for literature reviews in business management, discussing the benefits of using Topic Modelling and providing guidance on how to select and use appropriate algorithms.
In this regard, previous studies have demonstrated the potential of using topic modelling for literature reviews. For example, Kitanaka, Kwiatek, and Panagopoulos (2021) found that the use of artificial intelligence and machine learning can help academics in their work, such as writing literature reviews: they found that academic research published in JPSSM (Journal of personal selling & sales management) over 40 years accurately reflected the reality of business. Park et al. (2018) used topic modelling to study the intellectual structure of four hospitality journals over 40 years, discovering 50 topics and 8 subgroups with different evolutionary patterns of topic popularity, providing insights to predict the future evolution of scholarly attention to each topic; moreover, significant differences in topic proportions were found across the four leading hospitality journals, suggesting different foci of research topics in each journal. Talafidaryani (2021) used topic modelling to analyse the information systems literature on dynamic capabilities and provide a holistic understanding of the topic composition and trend of dynamic capabilities studies in information systems research.

Tools
As a tool, we used KhCoder, a text mining and topic modelling tool that can be used to analyse copious amounts of text data (Higuchi, 2016). It can be used to discover hidden patterns and relationships, generate new hypotheses and ideas, and create models that can be used to predict future events or trends.

Analysis
First, we found the terms (n-grams and stopwords) in the corpus to analyse. Stopwords and n-grams can indeed have a significant impact on the results of topic modelling, so we identified n-grams to include in the analysis and removed stopwords from the corpus before proceeding.
Stopwords are words that are commonly used in a language but have little meaning, such as 'a', 'the' and 'of' (standard stopwords). In addition to the standard stopwords, we manually created a list of custom stopwords using the word frequency function in KH Coder, looking at the most common words in the corpus that have little meaning in relation to the aims of our work. The custom stopwords identified are listed in the Appendix.
N-grams are contiguous sequences of n items from a given sequence of text or speech. The search for n-grams was performed by analysing the text data in Vosviewer (van Eck & Waltman, 2010), (van Eck & Waltman, 2011), deriving the ones reported in the Appendix.
When planning to use the topic modelling technique to analyse scientific literature (as well as any collection of texts), the number of unique words and n-grams to be analysed must be considered. Indeed, the definition of the optimal number of topics in which to constrain the model is closely related to the chosen minimum frequency of terms and thus to the number of terms selected for analysis. There are different methods in literature to determine the optimal number of topics in relation to the size of the text collection and the diversity of the terms it contains. To determine the value of K, we used the R package ldatuning (Nikita, 2016), which is the simplest way to calculate four metrics at once.Two of the metrics, (Arun, Suresh, Veni Madhavan, & Murthy, 2010) and (Cao et al., 2009), have to be minimised (top panel in Graph 1); the others (Griffiths & Steyvers, 2004) and (Deveaud, SanJuan, & Bellot, 2014) have to be maximised (bottom panel). Running Idatuning in KH Coder with the following parameters: minimum term frequency: 45 (180 selected words and n-grams, cumulative frequency 59.05%), we obtained 12 as the optimal number of topics for our collection of documents for the consistency of the results obtained with the methods of Cao Juan and Deveasud, while Griffiths and Arun metrics do not seem informative in this situation ( Figure.

Summary
The collections of terms that define each topic are shown in Tab. 1, while the distribution of topics across the collection is shown in Figure 2. Regarding the evolution of scientific attention to individual themes over the period 2006-2021 (Fig. A2 in the Appendix), the study found that themes 9, 6, 2 and 7 show a markedly increasing trend, while themes 3, 11 and 1 show a markedly decreasing trend. The attention paid to themes 4 and 8 appears to be stable, while that paid to themes 10 and 12 appears to be slightly increasing and that paid to theme 5 slightly decreasing.

Characterizing and Labeling Topics
The topic labelling phase, which involves understanding the character of each topic, is one of the critical phases in the interpretation of the results of topic modelling. Among the different methods of topic labelling (using a predefined set of labels, such as the set of categories in a taxonomy, using a set of keywords or terms that best describe the topic, reviewing the set of more representative documents of the topic), we decided to use the last one. This required the reading and critical review of the ten abstracts (carried out separately by two of the authors, with a procedure not dissimilar to that suggested by Tranfield, Denyer and Smart (2003) for conducting systematic literature reviews; then we characterised the topics by a short sentence, rather than by a single label, as suggested by Wan and Wang (2016). The results are summarised in Tab. 2, which cites the articles that best represent the character of the topic based on the described procedure. The probability of each of the 10 articles belonging to the topic can be found in the supporting information (sheet named 'Data for table 2').  (Sun et al., 2017), (Martin, Svensson, & Eklund, 2015), (Martin, 2015), …and their Evolution (Wu, Lu, & Jin, 2021), (Behera, Kim, Lee, Suh, & Park, 2012) #5 Catalysts, Barriers, Facilitators (Online Information-Sharing Platforms) (Fraccascia, 2020), (Fraccascia & Yazan, 2018), (Chen & Liu, 2021) (Secchi, Castellani, Collina, Mirabella, & Sala, 2016), (Royne, Berlin, & Ringstrom, 2015), (Diaz et al., 2021) #6a Assessment of environmental benefits (Martin, Wetterlund, Hackl, Holmgren, & Peck, 2020), (Marcinkowski, 2019) #7 Carbon Emissions Reduction (B. Zhang, Wang, & Lai, 2016), (Dong et al., 2014), (Yu, Han, & Cui, 2015), (B.  (Chari et al., 2022), (Morales, Lhuillery, & Ghobakhloo, 2022) #10 Synergies Discovery at regional scale

Discussion
Within the general framework of the multiple phenomena that are affecting our society and that are pushing towards environmental protection, which corresponds to an increase in scientific literature, as shown in Figure  A1 in the Appendix, our results show that in the collection of documents analysed there are only slight differences in terms of the volume of scientific production: none of the topics appears to be neglected, as each of them has a similar probability (ranging from 7.60% to 9.37%) of appearing in the entire collection. On the contrary, the dynamics of the popularity of the topics 2006-2021, which is the most interesting for the purposes of this work, shows significant differences that lead us to a preliminary classification based on the trend (increasing, decreasing or stable) that characterises each one. Among the first (9, 6, 2, 7) we find circularity assessment, supply chain sustainability, life cycle assessment (based on case studies), environmental benefit assessment, optimisation of material and energy integration (with a view to eco-industrial park planning) and carbon emission reduction. In the second group (1, 3, 11) we find industrial clusters moving towards industrial symbiosis, energy and heat recovery and assessment of environmental and economic benefits. With few differences in terms of growth or decline coefficients, the attention of researchers to the themes (4, 8, 10, 12, 5) of eco-industrial symbiosis networks and their evolution, resource recovery (water, heat), discovery of synergies at regional level, waste management and reuse, catalysts, barriers and facilitators (online information exchange platforms) has remained largely stable over time. It is also possible to observe that the topics that have been covered most so far (1, 3, 11) show negative growth rates, indicating that studies related to the assessment of environmental and economic benefits, the transformation of industrial clusters into EIPs and those related to energy and heat recovery can be considered to be in a mature phase of their life cycle and the literature may be saturated with research on this topic. Conversely, with the exception of topics 4, 5, 8, 10, 12, where the dynamics is only slightly increasing or decreasing, topics related to the optimisation of material and energy integration (with a view to the planning of EIPs), life cycle assessment (based on case studies), the reduction of carbon emissions and the assessment of circularity/sustainability of the supply chain show an increasing trend.
Despite the potential for multiple explanations for these trends, such as shifts in the scientific community's comprehension of the topics, the emergence of new data or techniques that enable a deeper understanding, or the fact that certain topics are becoming more or less talked about in the public opinion, which leads to more or less scientific focus on them, we believe that some of these trends can be attributed to particular causes. For example, it is possible to accept that increasing attention to circularity assessment (Topic 9) may be due to the need to integrate studies related to configuration and completeness of the EIPs (which are mainly found in Topic 4, which shows a slight downward trend) taking into account larger sets of indicators and their complex interactions in the perspective of the circular economy. A similar consideration can be made for the topic concerning life cycle assessment: like other topics that have scientific roots in industrial ecology, life cycle assessment is attracting growing interest from circular economy scholars. In fact, supplementary research conducted in WoS for a paper in preparation shows that, while contributions on life cycle assessment were mostly framed through the authors' keywords in the context of industrial ecology with respect to those considered belonging to the circular economy until 2017, since 2018 this proportion has decidedly reversed in ijbm.ccsenet.org International Journal of Business and Management Vol. 18, No. 5; favor of classification in the latter (sheet "analyze years LCA" in Supporting Information). The increase in interest towards Topic 2 (optimizing material and energy integration with a look to EIPs planning) can be explained by the fact that the networks of companies in symbiosis are entities in constant evolution (regarding the optimization of sewage streams) and expanding (regarding the possibility of including other enterprises thus enlarging their range of action) thus requiring more research effort. From another point of view, we believe that the growing interest in issues such as carbon emissions reduction and supply chain sustainability finds its main reasons in the phenomena that our society is going through: the first in the urgency, on a planetary level, to find countermeasures to the negative effects of air pollution, first of all to climate change; the second in the push of the Covid-19 pandemic to the need to guarantee supplies, also supported by the emergence of new technologies that make it easier to track and manage supply chains. As regards the topics whose popularity appears to be declining, if for the topics 3 and 11 we argue, given their highly technological character, of technologies in the maturing phase, for topic 1, having a strong methodological character, the trend can be attributed to the already existing multiple applications of consolidated methodologies for assessing synergies and benefits of IS.

Limitations of the Study
Some important limitations must be carefully considered when using topic modelling for literature review: first, topic models are based on statistical analysis of term frequencies, which means that they may not capture all aspects of meaning in a text (e.g. metaphor or metonymy); second, because topic models are based on probabilistic algorithms, they may produce different results when applied to different datasets or when different values of parameters are used; again, because topic models are unsupervised methods, they require some form of manual interpretation by the researcher in order to be useful. In other words, the limitations of making a prediction about the literature using topic modelling are due to the parameters that the analyst imposes on the many degrees of freedom inherent in the process, which take the results obtained out of their absolute guise and make them relative to the constraints imposed on those degrees of freedom. From the definition of the bibliographic database to be consulted, to the definition of the minimum frequency of terms (words, n-grams) to be considered -that is, the desired granularity of the analysis -the analyst's choices can indeed influence the results and the conclusions that can be drawn from their discussion. In this study, the main choices made were: the bibliographic database to be queried; the formulation of the query; the type of texts to be analysed (we dealt with the abstracts of the 611 articles rather than the titles, full texts or keywords); the minimum frequency of words and n-grams in the abstracts; the method used to define the optimal number of topics, which strictly depends on the minimum frequency of words and n-grams. These brief observations point the way to various lines of research: firstly, to check the consistency of the results obtained with those that could be obtained by consulting another bibliographic database and with those obtained by analysing titles, keywords or full texts; in the methodological question of the representativeness of the collection of documents extracted from the bibliographic database; in the direction of a more detailed analysis, reducing the minimum frequency of the objects to be considered and thus increasing the number of topics.  study, research, approach, analysis, result, case, model, paper, framework, use, article, concept, %, method, level, literature, rights, datum, cell, author N-grams: agglomeration economies, business model, circular economy, cleaner production, closed-loop, eco-industrial development, eco-industrial parks, eco-innovation, environmental impact, environmental management, environmental performance, geographic proximity, green economy, indicator system, industrial ecology, industrial park, industrial symbiosis, industrial waste, institutional capacity, material flow analysis, raw materials, resource efficiency, resource productivity, resource recovery, resource synergies, supply chain, sustainability transitions, sustainable development, sustainable operations, sustainable production, sustainable supply chain management, urban industrial symbiosis, urban metabolism, urban symbiosis, waste management, waste valorization, wastewater, zero waste

Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).