Assortativity Analysis of Real-World Network Graphs Based on Centrality Metrics

Assortativity index (A. Index) of real-world network graphs has been traditionally computed based on the degree centrality metric and the networks were classified as assortative, dissortative or neutral if the A. Index values are respectively greater than 0, less than 0 or closer to 0. In this paper, we evaluate the A. Index of real-world network graphs based on some of the commonly used centrality metrics (betweenness, eigenvector and closeness) in addition to degree centrality and observe that the assortativity classification of real-world network graphs depends on the node-level centrality metric used. We also propose five different levels of assortativity (strongly assortative, weakly assortative, neutral, weakly dissortative and strongly dissortative) for real-world networks and the corresponding range of A. Index value for the classification. We analyze a collection of 50 real-world network graphs with respect to each of the above four centrality metrics and estimate the empirical probability of observing a real-world network graph to exhibit a particular level of assortativity. We claim that a real-world network graph is more likely to be neutral with respect to the betweenness and degree centrality metrics and more likely to be assortative with respect to the eigenvector and closeness centrality metrics.


Introduction
The assortativity index (A.Index) of a network is a measure of the similarity of the end vertices of the edges in the network with respect to a particular node-level metric (Newman, 2010).That is, the A. Index of a network is a measure of the extent to which a vertex with a higher value for a particular node-level metric is connected to another vertex that also has a higher value for the node-level metric.Since the A. Index is nothing but a correlation coefficient (Pearson's product-moment correlation coefficient) (Newman, 1999) quantifying the extent of similarity of the end vertices of the edges, its value ranges from -1 to 1 (Strang, 2006).Traditionally, in the literature (Newman, 1999), networks with positive values of A. Index (closer to 1) are referred to as assortative networks; networks with negative values of A. Index (closer to -1) are referred to as dissortative networks and networks with A. Index values closer to 0 are classified as neutral.The similarity has been typically evaluated with respect to the degree centrality metric of the vertices, and the classification of networks (as either as assortative, dissortative or neutral) has been so far only based on the degree centrality metric (Newman & Girvan, 2003;Noldus & Van Mieghem, 2015).
Our hypothesis in this paper is that the assortativity classification of a real-world network could depend on the centrality metric used to compute the A. Index value of the network.In other words, a network could be classified as assortative with respect to one centrality metric and it could end up being classified as dissortative or neutral with respect to another centrality metric.Also, just having three different levels (assortative, neutral and dissortative) would not be sufficient to accurately assess the extent of assortativity of real-world network graphs whose A. Index values are neither close to 0, but nor close to 1 or -1.Until now, a formal range of A. Index values has not been defined to assess the level of assortativity of real-world networks.In this paper, we propose to divide the range of values (-1.0 to 1.0) for the A. Index fairly even to five levels and setup the following rule: strongly assortative (0.6 ≤ A. Index ≤ 1.0), weakly assortative (0.2 ≤ A. Index < 0.6), neutral (-0.2 < A. Index < 0.2), weakly dissortative (-0.6 < A. Index ≤ -0.2) and strongly dissortative (-0.2 < A. Index ≤ -1.0).
We investigate the validity of our hypothesis by analyzing a broader collection of 50 real-world networks whose spectral radius ratio for node degree (a measure of the variation in node degree) ranges from 1.01 to 3.48 (Meghanathan, 2014).We compute the A. Index values for these 50 real-world networks with respect to each of the four commonly used centrality metrics: degree centrality (DegC), eigenvector centrality (EVC; Bonacich, 1987), betweenness centrality (BWC; Freeman, 1977) and closeness centrality (ClC; Freeman, 1979), and apply the above proposed range of values to assess the assortativity levels of the real-world networks with respect to each of these four centrality metrics.For 40 of the 50 real-world networks analyzed, we observe that the level of classification of the network (strongly or weakly assortative, neutral, strongly or weakly dissortative) depends on the centrality metric under consideration.
Since we have analyzed a vast collection of networks with varying levels of complexity, we use the results of our assortativity analysis to empirically propose the likelihood of a real-world network being classified neutral or assortative (strongly assortative or weakly assortative) with respect to a particular centrality metric.Based on the results of our assortativity analysis on 50 real-world networks, we claim that any chosen real-world network is more likely (i.e., with a probability of 0.72) to be classified as neutral (neither assortative nor dissortative) with respect to the betweenness centrality metric, and more likely (i.e., with a probability of 0.66) to be classified as assortative (strongly or weakly) with respect to the ClC and EVC metrics.More specifically, we expect a chosen real-world network to be somewhat strongly assortative (with a probability of 0.38) with respect to the ClC metric and somewhat weakly assortative (also with a probability of 0.38) with respect to the EVC metric.
To the best of our knowledge, we have not come across a paper that has conducted a comprehensive assortativity analysis of complex real-world networks with respect to the four commonly used centrality metrics as well as empirically proposed the likelihood of observing a real-world network to be neutral, strongly assortative or weakly assortative with respect to a particular centrality metric.The rest of the paper is organized as follows: Section 2 reviews the four centrality metrics along with an example to illustrate their computation on a sample graph.Section 3 introduces the formulation for Assortativity Index (A.Index) and the proposed range of A. Index values to classify the assortativity level of a real-world network as well as presents a motivating example to illustrate that the A. Index of a network and its classification (as neutral, strongly/weakly assortative or dissortative) could depend on the centrality metric under consideration.Section 4 introduces the 50 complex real-world networks that are analyzed in this paper.Section 5 presents the results of assortativity analysis conducted on the real-world networks with respect to the four centrality metrics.Section 6 discusses related work and highlights the novel contribution of the work done in this paper.Section 7 concludes the paper.Throughout the paper, we use the terms 'node ' and 'vertex', 'link' and 'edge', 'network' and 'graph' interchangeably.They mean the same.All the real-world networks analyzed in this paper are modeled as undirected graphs.

Centrality Metrics
The four commonly used centrality metrics in complex network analysis are: degree centrality (DegC), eigenvector centrality (EVC; Bonacich, 1987), betweenness centrality (BWC; Freeman, 1977) and closeness centrality (ClC; Freeman, 1979).DegC and EVC are degree-based centrality metrics; whereas BWC and ClC are shortest path-based centrality metrics.Until now, the DegC metric has been typically used for assortativity analysis of real-world networks (Newman & Girvan, 2003;Noldus & Van Mieghem, 2015).In this paper, we are interested in conducting assortativity analysis of real-world networks with respect to all the above four centrality metrics.In this section, we briefly review these four centrality metrics and the procedure to compute them, along with an example for each.

Degree Centrality
The degree centrality (DegC) of a vertex is the number of edges incident on the vertex.The DegC of the vertices is computed by multiplying the adjacency matrix of the graph with a unit vector of 1s (the number of 1s in the unit vector corresponds to the number of vertices in the graph).Figure 1 illustrates an example to compute the degree centrality of the vertices.As can be noticed from this example, the DegC metric is vulnerable to incurring several ties among the vertices (as the metric values are integers and not real numbers).

Eigenvector Centrality
The eigenvector centrality (EVC) of a vertex is a measure of the degree of the vertex as well as the degree of its neighbors.The EVC values of the vertices in a graph correspond to the entries in the principal eigenvector of the adjacency matrix of the graph.We use the JAMA: A Java Matrix package (http://math.nist.gov/javanumerics/jama/) to compute the principal eigenvector of the adjacency matrix of a real-world network graph.The entries in the principal eigenvector can also be computed using the Power-Iteration method (Strang, 2006) that is illustrated in Figure 2. The tentative eigenvector X i+1 of a network graph at the end of the (i+1) th iteration is given by: , where ||AX i || is the normalized value of the product of the adjacency matrix and the tentative eigenvector X i at the end of the i th iteration.We continue the iterations until the normalized value of the product vector converges and the tentative eigenvector at that juncture corresponds to the principal eigenvector of the adjacency matrix of the graph.As the EVC values of the vertices are likely to be real numbers and are dependent on the degree of a vertex as well as the degrees of its neighbors, the EVC values of the vertices are more likely to be unique and relatively fewer ties are incurred (compared to degree centrality).

Betweenness Centrality
The betweenness centrality (BWC) of a vertex is a measure of the fraction of shortest paths going through the vertex when considered across all pairs of vertices in the graph (Freeman, 1977).If sp jk (i) is the number of shortest paths between vertices j and k that go through vertex i and sp jk is the total number of shortest paths between vertices j and k, then . The BWC of vertices is computed using a Breadth First Search (BFS; Cormen et al., 2009)-based implementation of the Brandes' algorithm (Brandes, 2001).We use the BFS algorithm to determine the shortest path trees rooted at each vertex and thereby deduce the level numbers of the vertices in the shortest path trees rooted at every vertex in the graph.The level number of a vertex i in the shortest path tree rooted at vertex j is the minimum number of hops from vertex j to i.The root vertex of a shortest path tree is said to be at level 0. The number of shortest paths from a vertex j to itself is 1.The number of shortest paths from a vertex j to a vertex k (at level l in the shortest path tree rooted at vertex j) is the sum of the number of shortest paths from vertex j to each of the vertices that are neighbors of vertex k in the graph and located at level l-1 in the shortest path tree rooted at j.The number of shortest paths between vertices j and k that go through vertex i is the maximum of the number of shortest paths from vertex j to vertex i and the number of shortest paths from vertex k to vertex i. Figure 3 illustrates an example for the computation of the BWC of the vertices in the same sample graph used in Figures 1-2.We notice that vertices with a high degree and/or EVC need not have a high BWC and vice-versa.For example, vertices 0 and 2 that had the largest value for the EVC metric have relatively low BWC value; whereas, vertex 4 (with a low degree and low EVC) has the largest value for the BWC.

Closeness Centrality
The closeness centrality (ClC) of a vertex (Freeman, 1979) is a measure of the relative closeness of the vertex with the rest of the vertices in the graph.The ClC of a vertex is measured by running the BFS algorithm on the vertex and determining the minimum number of hops to each vertex on the shortest path tree rooted at the vertex.The ClC of a vertex is the inverse of the sum of the shortest path lengths (hop counts) to the rest of the vertices in the graph.Figure 4 illustrates an example for the calculation of ClC of the vertices on the same sample graph used in Figures 1-3.

Network Model
Let G = (V, E) be the set of vertices and edges constituting a real-world network and let C(i) be the value of a centrality metric (C) for any node i in the network.We refer to the first vertex (vertex u) in an edge (u, v) as the upstream vertex and the second vertex (vertex v) in an edge (u, v) as the downstream vertex.As the focus of this research is on undirected graphs, we conveniently adopt the following convention to represent the edges: the ID of the upstream vertex of an edge (u, v) is always less than the ID of the downstream vertex of the edge (i.e., u < v).Let U and D be respectively the set of upstream and downstream vertices constituting the edges of a graph.Let U C and V C (calculated as in formulation-1 below) be respectively the average values for the centrality metric of interest among the vertices constituting the sets U and V. (1)

Assortativity Index
The Assortativity Index (A.Index) of a network (Newman, 1999) with respect to a particular node-level centrality metric is a quantitative measure of the extent of similarity of the end vertices of the edges with respect to the chosen centrality metric.The extent of similarity is calculated as the Pearson's Product-Moment Correlation Coefficient (Strang, 2006) between the set of upstream vertices (U) and set of downstream vertices (D) constituting the end vertices of the edges in a real-world network graph.Accordingly, the A. Index of a network with respect to a centrality metric C could be formulated as below. [ (2)

Range of Values for Assortativity Classification
As Assortativity Index is a measure of the level of correlation between the sets of upstream and downstream vertices constituting the edges in a network graph, the values for A. Index C with respect to any centrality metric (C) would range from -1 to 1. Until now in the literature, a network is considered to be assortative (dissortative) with respect to the chosen node-level metric (C) if the A. Index C value is closer to 1 (-1).If the A. Index C value is closer to 0, the network is considered to be neutral with respect to the metric C.However, we do not have a formally defined range of values that clearly indicate how the network should be classified if the A. Index C values are neither close to 1 or -1 and nor to 0. We seek to address this concern as follows: Since A. Index C is evaluated as a measure of correlation, we will adapt the range of correlation coefficient values (rounded to two decimals) proposed in the literature (Evans, 1995) for the level of correlation (shown in Table 1) and propose the range of assortativity index values (shown in Table 2) for classifying a network with respect to the level of assortativity.We propose only two levels of assortativity and two levels of dissortativity (rather than 5 levels for each) to give enough space for the range of A. Index values to classify a network at a particular level, including neutral (i.e., neither assortative nor dissortative), but still be able to differentiate a strongly assortative (dissortative) network from a weakly assortative (dissortative) network or neutral network with respect to a node-level metric.The color code to be used for the various levels of assortativity are also shown in Table 2.

Motivating Example
In this sub section, we illustrate the computation of the assortativity index of the sample graph used in Figures 1-4 with respect to the degree centrality (Figure 5) and eigenvector centrality (Figure 6) metrics.Adopting the proposed range of classification for the level of assortativity, we notice that the sample graph of Figures 1-4 could be classified as "weakly dissortative" (A.Index value of -0.22; see Figure 5) with respect to the degree centrality metric and "strongly assortative" (A.Index value of 0.81; see Figure 6) with respect to the eigenvector centrality metric.This is a motivating example to vindicate our hypothesis that the assortativity level classification for a network could vary depending on the centrality metric used to compute the A. Index values.

Real-World Network Graphs
We now present a brief overview of the 50 real-world network graphs analyzed in this paper.We model each network as an undirected graph of nodes and edges.The networks are identified with a unique ID (1, ..., 50) and a three character acronym.We use the spectral radius ratio for node degree (Meghanathan, 2014) to capture the extent of variation in the degree of the nodes: the spectral radius ratio for node degree is the ratio of the principal eigenvalue (Strang, 2006) of the adjacency matrix of the network graph to that of the average node degree.The values for the spectral radius ratio for node degree are 1 or above; the farther is the value from 1, the larger is the variation in node degree.We analyze real-world networks ranging from random networks to scale-free networks as the spectral radius ratio for node degree of the real-world networks analyzed in this graph ranges from 1.01 to 3.48.Table 3 lists the 50 networks along with the values for the number of nodes and edges, the spectral radius ratio for node degree (λ sp ) and average degree (k avg ).A brief description of the networks is as follows: 1) Word Adjacency Network (ADJ; Newman, 2006): This is a network of 112 words (adjectives and nouns, represented as vertices) in the novel David Copperfield by Charles Dickens; there exists an edge between two vertices if the corresponding words appeared adjacent to each other at least once in the novel.
2) Anna Karnenina Network (AKN; Knuth, 1993): This a network of 140 characters (vertices) in the novel Anna Karnenina; there exists an edge between two vertices if the corresponding characters have appeared together in at least one scene in the novel.
3) Jazz Band Network (JBN; Geiser & Danon, 2003): This is a network of 198 Jazz bands (vertices) that recorded between the years 1912 and 1940; there exists an edge between two bands if they shared at least one musician in any of their recordings during this period.
4) C. Elegans Neural Network (CEN; White et al., 1986): This is a network of 297 neurons (vertices) in the neural network of the hermaphrodite Caenorhabditis Elegans; there is an edge between two vertices if the corresponding neurons interact with each other (in the form of chemical synapses, gap junctions and neuromuscular junctions).
5) Centrality Literature Network (CLN; Hummon et al., 1990): This is a network of 118 papers (vertices) published on the topic of centrality in complex networks from 1948 to 1979.There is an edge between two vertices v i and v j if one of the corresponding papers has cited the other paper as a reference.
6) Citation Graph Drawing Network (CGD; Biedl & Franz, 2001): This is a network of 259 papers (vertices) that were published in the Proceedings of the Graph Drawing (GD) conferences from 1994 to 2000 and cited in the papers published in the GD'2001 conference.There is an edge between two vertices v i and v j if one of the corresponding papers has cited the other paper as a reference.
7) Copperfield Network (CFN; Knuth, 1993): This is a network of 89 characters in the novel David Copperfield by Charles Dickens; there exists an edge between two vertices if the corresponding characters appeared together in at least one scene in the novel.
8) Dolphin Network (DON; Lusseau et al., 2003): This is a network of 62 dolphins (vertices) that lived in the Doubtful Sound fiord of New Zealand; there is an edge between two vertices if the corresponding dolphins were seen moving with each other during the observation period.9) Drug Network (DRN; Lee, 2004): This is a network of 212 drug agents (vertices) of different ethnicities.
There is a link between two vertices if the corresponding agents know each other.
10) Dutch Literature 1976 Network (DLN; Nooy, 1999): This is a network of 37 Dutch literary authors and critics (vertices) in 1976; there exists an edge between two vertices v i and v j if the person corresponding to one of them is a critic who made a judgment (through a review or interview) on the literature work of the author corresponding to the other vertex.There is an edge between two nodes if the corresponding authors have co-authored at least one publication.
12) Faux Mesa High School Friendship Network (FMH; Resnick et al., 1997): This is a network of 147 students (vertices) at a high school community in the rural western part of US; there exists an edge between two vertices if the corresponding students are friends of each other.
13) Friendship Ties in a Hi-Tech Firm (FHT; Krackhardt, 1999): This is a network of 33 employees (vertices) of a small hi-tech computer firm that sells, installs and maintains computer systems; there exists an edge between two vertices v i and v j if the employee corresponding to at least one of them considers the employee corresponding to the other vertex as a personal friend.
14) Flying Teams Cadet Network (FTC; Moreno, 1960): This is a network of 48 cadet pilots (vertices) at an US Army Air Forces flying school in 1943 and the cadets were trained in a two-seated aircraft; there exists an edge between two vertices v i and v j if the pilot corresponding to at least one of them has indicated the pilot corresponding to the other vertex as a preferred partner with whom s/he likes to fly during the training schedules.
15) US Football Network (FON; Girvan & Newman, 2002): This is a network of 115 football teams (nodes) of US universities that played in the Fall 2000 season; there is an edge between two nodes if the corresponding teams have played against each other in the league games.
16) College Dorm Fraternity Network (CDF; Bernard et al., 1980): This is a network of 58 residents (vertices) in a fraternity college at a West Virginia college; there exists an edge between two vertices if the corresponding residents were see in a conversation at least once during a five day observation period.17) GD'96 Network (GD96; Batagelj & Mrvar, 2006): This is a network of 180 AT&T and other WWW websites (vertices) that were cited in the proceedings of the Graph Drawing (GD) conference in 1996; there exists an edge between two vertices if the website corresponding to one of them has a link to the website corresponding to the other vertex.
18) Marvel Universe Network (MUN; Gleiser, 2007): This is a collaborative network of 167 characters (vertices) in the comic books published by the Marvel Universe publishing company; there exists an edge between two vertices if the corresponding characters had appeared together in at least one publication.
19) Graph and Digraph Glossary Network (GLN; Batagelj & Mrvar, 2006): This is a network of 67 terms (vertices) that appeared in the glossary prepared by Bill Cherowitzo on Graph and Digraph; there appeared an edge between two vertices if the term corresponding to one of them is used to describe the meaning of the term corresponding to the other vertex.There exists an edge between two vertices if the corresponding conference visitors had face-to-face contact that was active for at least 20 seconds.
22) Huckleberry Coappearance Network (HCN; Knuth, 1993): This is a network of 76 characters (vertices) that appeared in the novel Huckleberry Finn by Mark Twain; there is an edge between two vertices if the corresponding characters had a common appearance in at least one scene.
23) Infectious Socio-patterns Network (ISP; Isella et al., 2011): This is a network of 309 visitors (vertices) who visited the Science Gallery in Dublin, Ireland during Spring 2009.There existed an edge between two vertices if the corresponding visitors had a continuous face-to-face contact for at least 20 seconds when they participated in the Infectious Socio-patterns event (an electronic simulation of the spreading of an epidemic through individuals who are in close proximity) as part of an art science exhibition.
24) Karate Club Network (KCN; Zachary, 1977): This is a network of 34 members (nodes) of a Karate Club at a US university in the 1970s; there is an edge between two nodes if the corresponding members were seen interacting with each other during the observation period.
25) Korea Family Planning Network (KFP; Rogers & Kincaid, 1980): This is a network of 37 women (vertices) at a Mothers' Club in Korea; there existed an edge between two vertices if the corresponding women were seen discussing family planning methods during an observation period.
26) Les Miserables Network (LMN; Knuth, 1993): This is a network of 77 characters (nodes) in the novel Les Miserables; there exists an edge between two nodes if the corresponding characters appeared together in at least one of the chapters in the novel.
27) Macaque Dominance Network (MDN; Takahata, 1991): This is a network of 62 adult female Japanese macaques (monkeys; vertices) in a colony, known as the "Arashiyama B Group", recorded during the non-mating season from April to early October, 1976.There existed an edge between two vertices if a macaque corresponding to one of them was recorded to have exhibited dominance over the macaque corresponding to the other vertex.Batagelj & Mrvar, 2006): This is a network of 35 teams (vertices) that participated in the 1998 edition of the Soccer World Cup.A player for a national team could sometimes have contract with one or more other countries.In this network, there is an edge between two vertices if the national team corresponding to at least one of them has contracted players from the country represented by the national team corresponding to the other vertex.

28) Madrid
42) Sawmill Strike Communication Network (SSM; Michael, 1997): This is a network of 24 employees (vertices) in a sawmill who planned a strike against the new compensation package proposed by their management.There exists an edge between any two vertices if the corresponding employees mutually admitted discussing about the strike with a frequency of three or more (on a 5-point scale).
43) Taro Exchange Network (TEN; Schwimmer, 1973): This is a network of 22 families (vertices) in a Papuan village.There exists an edge between two vertices if the corresponding families were seen exchanging gifts during the observation period.
44) Teenage Female Friendship Network (TWF; Pearson & Michell, 2000): This is a network of 47 female teenage students (vertices) who studied as a cohort in a school in the West of Scotland from 1995 to 1997.There exists an edge between two vertices if the corresponding students reported (in a survey) that they were best friends of each other.
45) UK Faculty Friendship Network (UKF; Nepusz et al., 2008): This is a network of 83 faculty (vertices) at a UK university.There exists an edge between two vertices if the corresponding faculty are friends of each other.

46
) US Airports 1997 Network (APN; Batagelj & Mrvar, 2006): This is a network of 332 airports (vertices) in the US in the year 1997.There is an edge between two nodes if there is a direct flight connection between the corresponding airports.

47) US States Network (USS):
This is a network of the 48 contiguous states in the US and the District of Columbia (DC).Each of the 48 states and DC is a node and there is an edge involving two nodes if the corresponding states (or DC) have a common border between them.

48) Residence
Hall Friendship Network (RHF; Freeman et al., 1998): This is a network of 217 residents (vertices) living at a residence hall located on the Australian National University campus.There exists an edge between two vertices if the corresponding residents are friends of each other.
49) Windsurfers Beach Network (WSB; Freeman et al., 1989): This is a network of 43 windsurfers (vertices) on a beach in southern California during Fall 1986.There exists an edge between two vertices if the corresponding windsurfers were perceived to be close to each other (determined based on a survey).
50) World Trade Metal Network (WTN; Smith & White, 1992): This is a network of 80 countries (vertices) that are involved in trading miscellaneous metals during the period from 1965 to 1980.There exists an edge between two vertices if one of the two corresponding countries imported miscellaneous metals from the country corresponding to the other vertex.

Results of Assortativity Analysis
We now present the A. Index values obtained for each of the 50 real-world network graphs (listed in Section 4) with respect to each of the four centrality metrics (introduced in Section 2).Table 4 lists the A. Index values and the values are color coded as per the range outlined in Table 2.One can easily see that for about 80% of the real-world networks analyzed (i.e., for 40 of the 50 real-world networks analyzed), the level of assortativity is not the same for all the four centrality metrics.For a majority (i.e., 56%) of the real-world networks (i.e., for 28 of the 50 real-world networks), we observe two different levels of assortativity and most of these are the neutral and weakly assortative levels.For very few real-world networks, the two different levels of assortativity represent levels whose ranges of assortativity index values are not contiguous (for example: neutral and strongly assortative).For about 24% of the real-world networks analyzed (i.e., 12 of the 50 real-world networks), we observe three levels of assortativity.For none of the real-world networks, we observe four different levels of assortativity (i.e., one assortativity level per centrality metric).Only 6-14% of the real-world networks are either weakly or strongly dissortative with respect to any centrality metric.
We also plot (Figures 7-10) the distribution of the A. Index values for each of the four centrality metrics.We estimate the probability of observing a network to be at a particular level of assortativity with respect to a centrality metric as the fraction of the total number of real-world networks exhibiting the particular level of assortativity with respect to the centrality metric.These empirically estimated probability values are also listed in Figures 7-10.As a high-level conclusion, we could say that there is at least a 50% chance for a real-world network to be neutral (neither assortative nor dissortative) with respect to the degree centrality and betweenness centrality metrics.On the other hand, we observe that there is at least a 50% chance for a real-world network to be assortative (either strongly assortative or weakly assortative) with respect to the closeness centrality and eigenvector centrality metrics.
More specifically: we observe a real-world network to be neutral with respect to the BWC and DegC metrics with a probability of 0.72 and 0.58 respectively.When considered with respect to the EVC metric, we observe a real-world network to be weakly assortative with a probability of 0.38 and strongly assortative with a probability of 0.28.When considered with respect to the ClC metric, we observe a real-world network to be strongly assortative with a probability of 0.38 and weakly assortative with a probability of 0.28.Note that though both BWC and ClC are shortest path-based centrality metrics, we observe that they are poles apart with respect to assortativity.While a real-world network is more likely to be neutral (neither assortative nor dissortative) with respect to the BWC metric, we observe a real-world network to be more likely to be strongly assortative or weakly assortative with respect to the ClC metric.Table 5 summarizes these empirically estimated probability values for all the five levels of assortativity and all the four centrality metrics.Figure 11 presents a pictorial view of the empirically estimated probability values for observing a real-world network at a particular level of assortativity with respect to a centrality metric.An interesting and significant observation from the color-coded Table 3 is that for real-world networks with two or three levels of assortativity with the centrality metrics: the level of assortativity typically exhibited a transition from dissortative to neutral (or) neutral to weakly assortative to strongly assortative when the centrality metrics are considered in this order: BWC, DegC, EVC and ClC.We also notice from Figures 7-10 that the distribution of the A. Index values gradually drifts from a predominantly neutral-level distribution (corresponding to the BWC and DegC metrics) to a predominantly assortative-level distribution (corresponding to the EVC and ClC metrics).Such observations further vindicate our conclusions (in the previous paragraphs) regarding the probability of observing a real-world network to be neutral, weakly assortative and strongly assortative with respect to the centrality metrics.

Related Work
To the best of our knowledge, all the results reported in the literature (e.g., Newman, 1999;Newman & Girvan, 2003;Noldus & Van Mieghem, 2015) on assortativity of real-world network graphs is based on the degree centrality metric.Ours is the first effort to study the assortativity of real-world network graphs based on the other commonly used centrality metrics such as betweenness centrality, eigenvector centrality and closeness centrality.We analyze the assortativity of a large collection of real-world network graphs (with a broad range of variation in node degree) and empirically propose the likelihood of observing a real-world network graph to be neutral or assortative with respect to a centrality metric.In this section, we discuss results from the most related work on assortativity and centrality metrics in the literature.
Traditionally, based on the degree centrality metric, social networks have been found to be assortative (high degree nodes tend to attach to high degree nodes); whereas, the technological and biological networks have been observed to be dissortative (i.e., low degree nodes tend to attach to high degree nodes and vice-versa; Newman, 2003).The networks generated from theoretical models such as the Erdos-Renyi random networks (Erdos & Renyi, 1959), Barabasi-Albert scale-free networks (Barabasi & Albert, 1999) and the Watts-Strogatz small-world networks (Watts & Strogatz, 1998) have also been observed to be neutral (neither assortative nor dissortative) with respect to the degree centrality metric (Newman, 1999).In addition, networks that evolve with time without any constraints have been observed to reach a maximum entropy state (entropy is a quantitative measure of robustness; Demetrius & Manke, 2005) with heterogeneous connectivity distribution and in such a state, networks have been usually dissortative (Johnson et al., 2010) with respect to the degree centrality metric.On the other hand, networks that evolved with constraints (with respect to the number of links a node can maintain) tend to transition from being dissortative to assortative with time (Konig et al., 2010).Also, synthetic social network graphs generated using the Monte Carlo Metropolis-Hastings type algorithms (Chib & Greenberg, 1995) were observed to quickly evolve to a giant component if edge distribution (based on remaining degree: one less than the degree centrality; Newman, 1999) follows assortative matching (rather than dissortative matching; Newman, 2003).Iyer et al. (2013) analyzed the robustness of networks due to targeted removal of vertices that are ranked higher with respect to centrality metrics.It has been observed that dissortative networks degrade more rapidly due to the removal of vertices with higher degree; whereas, assortative networks degrade more rapidly due to the removal of vertices with higher betweenness (at least for the first 25% of the vertices) as the high degree vertices in assortative networks tend to form a concentrated interconnected core that would be difficult to break due to the removal of few vertices.For neutral networks (with assortativity index close to 0), targeted node removal based on degree has been observed to be the most effective method to degrade the network and targeted removal based on eigenvector centrality has been observed to be the least effective (Iyer et al., 2013).The findings from this paper could be considered complementary to the above research results as we observe real-world network graphs to be more likely assortative with respect to the EVC metric; hence, removal of vertices with higher EVC is more likely to have a relatively less degrading effect on the assortativity of networks.Zhang et al. (2012) argued that assortativity level of the different communities with their neighborhood need not be the same as the assortativity level of the entire network.This could be attributed to the differences in the connectivity of the vertices in the various communities to their respective outside world.In this regard, Zhang et al. (2012) proposed an alternate metric called the Universal Assortativity Coefficient (UAC) defined for a community (sub graph) of vertices as the sum of the local assortativity indices of edges (Newman, 1999) emanating from the vertices that are part of the community.The local assortativity index of an edge is calculated as per the remaining degree based formulation proposed by Newman (1999): Edges with positive local assortativity index are referred to as assortative and edges with negative local assortativity index are referred to as dissortative.Zhang et al. (2012) claimed that a globally assortative network could still have majority of its edges to be locally dissortative and vice-versa.Similar to local edge assortativity, a measure called local node assortativity (Piraveenan et al., 2008) based on the remaining degree of a node has also been proposed in the literature; the sum of local node assortativity values is equal to the network assortativity.It has been shown by Piraveenan et al (2009) that distribution profiles of the local assortativity of nodes vs. their degrees could be used to identify assortative hubs in social and biological networks and dissortative hubs in scale-free networks such as the Internet.All of the above analyses has been based on only the degree centrality metric and is heavily based on the concept of remaining degree.Joyce et al. (2010) proposed the notion of leverage centrality to capture the assortative or dissortative neighborhood of a node.The range of values for leverage centrality is (-1, ..., 1): positive values indicating an assortative neighborhood and negative values indicating a dissortative neighborhood.A node has a positive leverage centrality if it is connected to more nodes than its neighbors (assortative neighborhood); a node connected to fewer nodes than its neighbors has a negative leverage centrality (dissortative neighborhood).Nodes having higher leverage centrality are perceived to be important for facilitating information flow to and from its neighbors.Leverage centrality of a node is estimated simply based on the degree of the node and that of its neighbors; the centrality metric has been observed to be strongly correlated with betweenness centrality and weakly correlated with eigenvector centrality metric.Per the trends observed in this paper, we expect a real-world network to be more likely to be neutral (neither assortative nor dissortative) with respect to the leverage centrality metric (as is also observed for the betweenness centrality metric).Meghanathan (2015) observed the degree centrality and betweenness centrality metrics to be highly correlated for real-world network graphs: we opine such a correlation is justified with the real-world network graphs exhibiting almost similar levels of assortativity with respect to both these centrality metrics in this paper.Meghanathan (2016) showed that a maximal assortative matching of vertices (with the objective of maximizing the assortativity index) in real-world network graphs (with respect to the degree centrality metric) cannot maximize the number of matched vertices and vice-versa.We attribute such a phenomenon to the relatively neutral levels of assortativity of the edges with respect to the degree centrality metric and its closely correlated betweenness centrality metric.As we observe the closeness centrality metric to exhibit stronger levels of assortativity, we opine that a maximal assortative matching of vertices based on closeness centrality would relatively increase the number of vertices matched (compared to the degree centrality metric).

Conclusions and Future Work
We have shown that the assortativity classification of real-world network graphs is dependent on the node-level centrality metric used to compute the assortativity index values of the edges.As part of this analysis, we formally propose five levels of assortativity and their associated ranges in the space of assortativity index values from -1 to 1.We computed the assortativity index values for a suite of 50 real-world network graphs (with spectral radius ratio for node degree ranging from 1.01 to 3.48) with respect to each of the four commonly used centrality metrics: degree centrality (DegC), eigenvector centrality (EVC), betweenness centrality (BWC) and closeness centrality (ClC).We observe about 80% of the real-world network graphs to exhibit more than one assortativity level (depending on the centrality metric used to compute the assortativity index values): 56% exhibiting two assortativity levels and 24% exhibiting three assortativity levels.We notice for a majority of these real-world network graphs, the level of assortativity exhibited a transition from dissortative to neutral (or) neutral to weakly assortative to strongly assortative when the centrality metrics are considered in this order: BWC, DegC, EVC and ClC.Using the results of the assortativity analysis, we also estimated the empirical probability for a real-world network graph to exhibit a particular level of assortativity: We claim that a real-network graph is more likely (probability of 0.72) to be neutral (neither assortative nor dissortative) with respect to the BWC metric and is more likely to be assortative (strongly or weakly assortative: probability of 0.38 + 0.28 = 0.66) with respect to the EVC and ClC metrics.
We have thus unraveled significant information about the assortativity of real-world network graphs with respect to the other commonly used centrality metrics such as betweenness, closeness and eigenvector centrality.As part of future work, we plan to analyze the centrality-based assortativity of complex network graphs generated from theoretical models (such as the Erdos-Renyi random network model (Erdos & Renyi, 1959), Barabasi-Albert scale-free network model (Barabasi & Albert, 1999) and the Watts-Strogatz small-world network model; Watts & Strogatz, 1998).We also plan to investigate the use of centrality metrics (other than degree centrality) to compute maximal assortative matching and maximal dissortative matching (Meghanathan, 2016) for real-world network graphs.

Figure 1 .
Figure 1.Example to Illustrate the Computation of the Degree Centrality of the Vertices in a Graph

Figure 2 .
Figure 2. Example to Illustrate the Computation of the Eigenvector Centrality of the Vertices in a Graph

Figure 3 .
Figure 3. Example to Illustrate the Computation of the Betweenness Centrality of the Vertices in a Graph

Figure 4 .
Figure 4. Example to Illustrate the Computation of the Closeness Centrality of the Vertices in a Graph

Figure 5 .
Figure 5. Example to Illustrate the Calculation of the Assortativity Index based on Degree Centrality 20) Graph Drawing 2001 (GD01) Network(Batagelj & Mrvar, 2006): This is a network of 101 papers (vertices) that were cited as references in the papers published in the proceedings of the 2001 Graph Drawing (GD'01) conference; there exists an edge between two vertices if the corresponding papers have been co-cited in at least one paper published in the GD'01 conference.21) Hypertext 2009 Network (HTN; Isella et al., 2011): This is a network of the face-to-face contacts of 115 attendees (vertices) of the ACM Hypertext 2009 conference held in Turin, Italy from June 29 to July 1, 2009.

Figure 7 .
Figure 7. Distribution of Assortativity Index Values for Real-World Networks (based on Betweenness Centrality)

Table 1 .
Range of Correlation Coefficient Values and the Corresponding Levels of Correlation

Table 3 .
Fundamental Properties of the Real-World Network Graphs used for Assortativity Analysis Loomis et al., 1953)4))MTB;Hayes, 200Gil-Mendieta & Schmidt, 1996)uspected individuals and their relatives (vertices) reconstructed by Rodriguez using press accounts in the two major Spanish daily newspapers (El Pais and El Mundo) regarding the bombing of commuter trains in Madrid onMarch 11, 2004.There existed an edge between two vertices if the corresponding individuals were observed to have a link in the form of friendship, ties to any terrorist organization, co-participation in training camps and/or wars, or co-participation in any previous terrorist attacks.Mexican Political Elite Network (MPN;Gil-Mendieta & Schmidt, 1996): This is a network of 35 Mexican presidents and their close collaborators (vertices); there exists an edge between two vertices if the corresponding two people have ties that could be either political, kinship, friendship or business ties.33)ModMathNetwork(MMN;Batagelj&Mrvar,2006):Thisisanetwork of 30 school superintendents (vertices) in Allegheny County, Pennsylvania, USA during the 1950s and early 1960s.There exists an edge between two vertices if at least one of the two corresponding superintendents has indicated the other person as a friend in a research survey conducted to see which superintendents (who are in office for at least a year) are more influential to effectively spread around some modern Math methods among the school systems in the county.34)USPoliticsBooksNetwork(PBN;Krebs,2003):This is a network of 105 books (vertices) about US politics sold by Amazon.comaround the time of the 2004 US presidential election.There exists an edge between two vertices if the corresponding two books were co-purchased by the same buyer (at least one buyer).35)PrimarySchoolContactNetwork(PSN;Gemmettoetal., 2014): This is a network of children and teachers (238 vertices) used in the study published by an article in BMC Infectious Diseases, 2014[40].There exists an edge between two vertices if the corresponding persons were in contact for at least 20 seconds during the observation period.36)PrisonFriendshipNetwork(PFN;MacRae, 1960): This is a network of 67 prison inmates (vertices) surveyed by John Gagnon in the 1950s regarding their sociometric choice.There exists an edge between two vertices if an inmate corresponding to at least one of them has listed the inmate corresponding to the other vertex as one of his/her closest friends.37)SanJuan Sur Family Network (SJN;Loomis et al., 1953): This is a network of 75 families (vertices) in San Juan Sur, Costa Rica, 1948.There exists an edge between two vertices if at least one of the corresponding families has visited the household of the family corresponding to the other vertex once or more.
Batagelj & Mrvar, 2006)he production of 295 articles for the Social Networks Journal since its inception until 2008; there is an edge between two vertices if the corresponding authors co-authored at least one paper published in the journal.31)AuthorFacebookNetwork(AFB):This is a network of the 171 friends (vertices) of the author in Facebook.There exists an edge between two vertices if the corresponding people are also friends of each other.32)38)ScotlandCorporateInterlocksNetwork(SDI;Scott,1980):This is a network of multiple directors (a director who serves on multiple boards) and companies (a total of 230 vertices) during 1904-05 in Scotland.There exists an edge between two vertices v i and v j if any of the following are true: (i) both v i and v j correspond to two different multiple directors who are in the board of at least one company; (ii) one of the two vertices corresponds to a multiple director and the other vertex corresponds to one of the companies in whose board the person serves.39)SenatorPressReleaseNetwork (SPR;Grimmer, 2010): This is a network of 92 US senators (vertices) during the period from 2007 to 2010.There exists an edge between two senators if they issued at least one joint press release.40)SlovenianMagazine Network (SMN;Batagelj & Mrvar, 2006): This is a network of 126 different magazines (vertices); there exists an edge between two vertices if at least one reader (among a total of 100,000 readers) indicated that s/he reads the corresponding two magazines as part of a survey conducted in 1999 and 2000.41) Soccer World Cup 1998 Network (SWC;

Table 4 .
Fundamental Properties of the Real-World Network Graphs used for Assortativity Analysis

Table 5 .
Empirically Estimated Probability Values for the Assortative Level of a Real-World Network with respect to the Centrality Metrics