Global Research Collaboration for Vapour Intrusion

The complexity of the vapour intrusion (VI) transport pathway has received an ever-increased interest worldwide, and an improved and consolidated understanding of the VI issue requires collaboration between international research groups. This study uses the social network analysis methodology, applied to bibliometric authorship for VI research, to discover trends in collaboration, identify lead scientists, organisations, and countries. Furthermore, some of the external factors influencing the collaboration and productivity were assessed. The data suggests that the global research network for VI produced over a time span of 54 years 566 publications via 157 sources. The research network is composed of 437 organisations and 1053 authors from 33 countries. This suggests an increasingly active international collaborative research effort. However, inter-continental cooperation is much less than continental. The top five most central countries in the network are the USA, followed by Canada, China, The Netherlands, and Italy. The researchers with the most publications are from these five countries as well as the top organisations. The social network analysis conducted shows a good approximation of the collaborative structure for the key countries, organisations and researchers involved. Since 2010, the research community has become more stable.


Social Network Analysis
A social network contains members (nodes) with related relationships (edges). The presents or absence of edges between the nodes characterises the network and is investigated and quantified in an SNA. SNA allows for the identification of the most important nodes in the network, how they are grouped, and the extent of the cooperation. Nodes can be individual researchers, groups of researchers (departments), organisations or countries where researchers are located. Depending on the nodes the edges have a different meaning. For example, the edges between members, like organisations, can indicate the number of shared published work, or cooperation between countries. In an SNA the edges between nodes are related to the network and are not attributes of nodes (Fonseca et al., 2016). Scientists collaborate to discover knowledge that often requires different skills for complex problems (Sonnenwald, 2007;Katz & Martin, 1997). Published work, such as peer-reviewed papers or conference proceedings, reflects the collaboration between scientists via authorship (Newman, 2004). Authorship analysis reveals the cooperation patterns between scientists, the organisations they work for and the countries in which they are situated (Melin & Persson, 1996;Glanzel & Schubert, 2005;Newman, 2004). The steps applied for this authorship analysis are collection, normalisation, and curation of publications, followed by the calculation of metrics and visualisation.

Collection, Normalisation and Curation of Author Information
Published work was obtained from several sources: The search terms used were vapor or vapour and intrusion in the title or abstract. The resulting database covers the following type of publications: papers from journals, chapters from books, Bachelor, Master or Doctoral theses, articles in bulletins or magazines, research reports mostly from project work, and conference proceedings. Journals, chapters in books and theses are considered to be peer-reviewed, while the peer-review process for articles, reports, and conference proceedings is not clearly defined.
From each publication, the letters of the first name and full last name of each author were recorded in a spreadsheet, checked for accuracy, and recorded together with the organisations name, address, and geographical coordinates (longitude and latitude). As each authors' name and organisation are core components of the SNA, considerable time was spent verifying the accuracy of the data collected by manually checking spelling errors, omissions, the similarity of names and name changes. For authors who reported multiple organisations the first indicated organisation was selected. The department or faculty of the authors was not recorded, for example, some researchers work for the same university, but in different departments. For some publications more than one corresponding author is proposed. In this case the first was selected. Authors of reports were recorded individually, and if not presented in the report, the publishing organisation was recorded. Some of the organisations do no longer exist, so the geographical location was put in the centre of the city. Special characters in names and organisations were simplified to base letters (ü becomes u). Names of Journals are written in full (not abbreviation). Publications with many authors, like for some of the reports, were investigated and reviewers were not considered while contributing authors were retained (Adams, 2012). The standardizing, curation, and profiling of data was performed with the Knime data analysis platform (Berthold et al., 2009;Tiwari & Sekhar, 2007). The aim was to consolidate names and organisations to ensure the acknowledgement of the cooperation because inaccurate data can generate errors in the presentation of the network (Wang et al., 2012;Barbastefano et al., 2013).

Calculation of Metrics and Visualisation
The authors, organisations, and journals were then transformed into matrices that provide the nodes (like author or organisation) and their relationship via edges (like the shared publications or country). The edges indicate the intensity of the cooperation (nodes share the authorship of a publication). The authorship cooperation is considered to be reciprocal, so non-directional. A network provides insight into the strategic importance of a node relative to other nodes, and to identify the cooperation's centrality measures are calculated (Freeman, 1978;Ramadan, Alinsaif & Hassan, 2016). Several measures were calculated to understand the cooperation in the network. The below topological measures were used.

Degree-Based Measurements
The degree measures provide insight into the number of neighbours a node has. The more connections a node has the more important the node is in the network. A higher centralisation indicates that one or more nodes are connecting with many others, a low centralisation indicates that the connections between nodes are more evenly distributed.
The following degree centralities are calculated (description from Knime, 2021): 1) The node degree counts the number of incident edges. The number of nodes and links represents the network size. The percent degree is the percentage of edges in comparison to the total amount of edges.
2) The in-and out-degree count the number of incoming and outgoing edges. The percent incoming and outgoing degree is the percentage of edges in comparison to the total amount of edges.

Shortest-Path-Based Measurements
Distance measures were used to express the cooperation between nodes in the network. The smaller the distance between the nodes, the more related any the nodes are (Gong et al., 2016).
The following distance-based measures were calculated (description from Knime, 2021): 3) The closeness centrality divides the number of nodes of the component by the sum of all distances from the analysed node to all other nodes within the component (Knime, 2021). The node with the highest value is the most central node of its component. The closeness centrality (Sabidussi, 1966;Freeman, 1978) measures the average distance of the shortest path from a node to other nodes or is the closest to all other nodes. Centralization refers to the degree to which edges are concentrated in a few nodes in the network and evaluates the presents of dominant nodes in the network.
4) The node weight sum is the sum of the weight of all incident edges, and the average node weight (del Rio, Koschützki & Coello, 2009).
5) The clustering coefficient analyses the neighbourhood of the node. If they form a clique the coefficient is 1, and if no neighbour is connected to another neighbour it is 0.
6) The Barycenter assigns scores to each node according to the sum of its distances to all other nodes. The edge weight represents similarity, and this is converted to a distance by using the JUNG Framework (O'Madadhain et al., 2005). It indicates what node exerts the most influence in the network (Viswanath, 2009;Gadat, Gavra & Risser, 2016).

Importance Measures
7) The Hubs and Authority centrality assigns hub and authority scores to each node (León, 2013;Xutao, Michael & Yunming, 2012;Kleinberg, 1999). Hub and Authority centrality focus on the structure of the network and determine its importance according to their positions on a graph (Marra et al., 2015) and is affected by the total relations that occur outside the node (Farhan, Darwiyanto & Asror, 2019). The node is a hub when it has edges to authoritative nodes. It is an authority if it is referenced by 'hub' nodes. This implies a mutually reinforcing centrality. Therefore, a high hub node points to many good authorities, and a high authority node receives from many good hubs. To calculate the score the JUNG framework is used (O'Madadhain et al., 2005).
Palladio (Edelstein, Coleman & Findlen, 2017) was applied for the geographical visualization of the cooperation, while Knime (Berthold et al., 2009) was used to visualise the various networks (Fillbrunn et al., 2017). Network drawings were generated by using the Kamada-Kawai algorithm (Kamada & Kawai, 1989). A network describes the overall structure with the main nodes (authors) and their different connections (edges) (Valente, 2010), and factors that may have affected its configuration could be assessed against the network.
A workflow was built in Knime to produce tables that provide an overview of the different type of publications by year, the productivity of authors, organisations, countries, journals, as well as their relationships in the network (Tsay & Shu, 2011;Rosas et al., 2011;Ellegaard & Wallin, 2015;Merigó & Yang, 2017;Newman, 2004).

Results
The below results reflect the international collaboration on research efforts for VI. Tables and figures are used for descriptive findings, overall performance, and research output. The results identify and quantity productivity and top impact, as well as collaboration patterns.  The search terms resulted in a total of 566 publications, spanning 54 years (1966 till half 2020). Figure 1a shows that 78% of the publications are from peer-reviewed sources, like journal papers, book chapters, and theses, while A total of 33 countries published work on VI, reflecting an international research network. It is evident that the USA generates the most publications, and so has most authors and related organisations involved in a wide variety of published sources. Table 1 reveals that the USA is the most productive when it comes to generating knowledge on the topic of VI, followed by Canada, China, Netherlands, Italy, and Australia.   A total of 437 organisations were involved, of which 143 (33%) were universities or schools. The most productive organisation is the (USA) Environmental Protection Agency, closely followed by Brown University, Arizona State University, and Geosyntec Consultants Inc.
The organisation with the most authors involved are the Environmental Protection Agency and the University of California, followed by Geosyntec Consultants Inc., National Institute of Public Health and the Environment, Zhejiang University and Groundwater Services Inc. Brown University published most papers in scientific journals.

Collaboration
The below figures 2 to 5 display the scientific cooperation between organisations, authors, and countries. To visualise the cooperation between organisations and countries the number of publications was aggregated.
 represent the organisation geographical location based on the longitude and latitude coordinates, ▬ represent the cooperation as a result of a jointed publication as author. The lower two images provide an enlargement for the USA plus Canada, as well as for Europe. Figure 2 shows that most organisations are situated in the USA and Canada. Cooperation between European or Asian countries is limited. Inter-continental cooperation occurs, mostly between the USA and Canada, and both countries cooperating with affiliates in the EU or Asia.     figure 3a where the edge is above 1, meaning that organisation collaborated more than once, therefore highlighting the key organisations in the network. The organisations with a high node degree (cooperation) are not necessarily those with authority in the network.   The underlying research networks between organisations and researchers are reflected in the network of countries. Figure 5 shows the collaboration between countries. A darker red node (higher node degree) indicates that the country cooperates with other countries, while the blue outline stroke reveals the authority of a country in the network. The figure reveals that the USA collaborates most with other countries, specifically with China, and Canada, given the width of the edges. Belgium and the Netherlands also collaborate frequently on the topic of VI. The USA, China, Canada and Italy are considered authorities in the country network and collaboration.

Discussion and Conclusions
This study investigates scientific bibliometric authorship for VI research to assess collaboration trends between (identified lead) scientists, organisations, and countries. External factors influence research collaboration and scientific productivity and are to the extent possible addressed.
The data suggests that the global research network for VI produced over a period of 54 years, 566 publications (figure 1) through 157 sources (table 2). Three-quarter of the publications are made available from sources that apply a peer review process, and in the last 10 years around 50% of the work on VI was published (figure 1). The research network on VI is composed of 437 organisations (table 4) and 1053 authors (table 3) from 33 countries  (table 1). This suggests increasing active international collaborative research effort for VI. However, the intercontinental cooperation (e.g. USA -China or Canada -United Kingdom) is much less than continental (e.g. USA -Canada or Belgium -Netherlands) (figure 2). A factor affecting cooperation is the importance given to the exposure route of VI in the various legislative frameworks for CLM. Since the Love Canal Tragedy in 1978 (Phillips, Hung & Bosela, 2007) and BKK landfill in West Covina southern California (Wood & Porter, 1987) VI start receiving attention in the context of CLM in the USA, resulting in the Superfund programme. Around ten years later European countries started regulating CLM. Norway enacted the Pollution Control Act of 1981, Greece the Environmental Law was enacted in 1986, and The Netherlands voting the Soil Protection Act into force in 1987 (Ferguson, 1999). Likely regulating CLM is linked to funding made available for VI research. As a result, the top five most central countries are the USA, followed by Canada, China, Netherlands, and Italy (table 1). The researchers with the most publications are from these five countries (table 3) as well as the top organisations (table  4). The same applies to the centrality measures (figure 3a/b, 4a/b, and 5).
Measures, like the degree centrality, are a proxy for cooperation in a network but do not necessarily represent the volume of published work. For example, the United Kingdom is amongst the countries with the highest node degree (figure 5) but is behind the Netherlands and Italy in the number of publications (table 1). Despite Canada publishing more papers than China (respectively 66 and 14 publications) (table 1)  ). This provides them with some influence and control of information.
The SNA conducted shows a good representation of the network structure and key countries, organisations and researchers involved and provides suggestions for further research. Since 2010, the research community has become more stable. Further temporal analysis could provide insights into the evolution and changes of the research network. Adding additional data can add to the perspective on VI research, for example adding authors from literature references or data collection to determine what drives researchers and how they maintain their network (Laudel, 2002).
Although evidence of successful SNA based policies is limited, some examples highlight the potential use (Morel et al., 2009;Bender et al., 2015;Eslami, Ebadi & Schiffauerova, 2013;Lander, 2013). The results from this SNA shows: 1.) Fragmentation of the network (figure 3a/4a) and so a need to consolidate cooperation's amongst researchers, organisations, and countries. The countries with most published work and collaborations are from high-income economies (figure 5), ergo the need arises to cooperate with middle-and low-income countries as to transfer knowledge on CLM and the VI pathway.
2.) Few organisations are central in the network and consolidate knowledge (figure 3a/b). Universities created a network in which they cooperate, public bodies (Environmental Protection Agency, Beijing Municipal Institute of Environmental Protection) and in the USA also with consultancy companies. The latter is prominent in the network (Groundwater Services Inc., Geosyntec Consultants Inc., Golder Associates Ltd.), as well as commercial companies (Shell Global Solutions, Chevron Energy Technology Company, Groundswell Technologies Inc.). The cooperation happens mainly in the USA and Canada, less in the EU and Asia.
3.) Country-specific settings or legislation for CLM can influence the cooperation in the VI network. In the USA the Superfund program (EPA, 2021) is responsible for financing the clean-up of the most contaminated land and required organisations to cooperate to bring knowledge of many aspects together. The EU has funded several networks like NICOLE (Network for Industrially Co-ordinated Sustainable Land Management in Europe) and CLARINET (The Contaminated Land Rehabilitation Network for Environmental Technologies in Europe). The NICOLE network aim was to coordinate sustainable land management in the EU, and increase the cooperation between various players (academia, service providers and industry) for the development and application of sustainable technologies (NICOLE, 2002(NICOLE, , 2012. CLARINET's primary objective was to develop technical recommendations for decision-making on the rehabilitation of contaminated land in the EU. Furthermore, to identify and report on research and development needs (CLARINET 2002;Bardos, 2003). Pertaining to the exposure path of VI the cooperation between industry and research is visible in the USA, not in the EU, despite the aim of the funded networks. Why this is not visible in the results needs further research.
SNA has proven to be a useful tool for retrieving the composition and cooperation in the VI network of researchers, organisations, and countries. However, SNA has limitations, as researchers do not only cooperate via published work. Cooperation does not imply knowledge sharing; however, authorship requires a level of cooperation beyond the exchange of information. Quantitative SNA is unable to analyse the reason and motivation for the network structure. This issue can be assessed by using qualitative SNA methods like interviews or a more in-depth data gathering on the cooperation (Kolleck, 2013).

Conflict of interest
The authors declare that they have no conflict of interest with the topic or company that distributes the software used in this paper.