Recovery of Software Architecture Using Partitioning Approach by Fiedler Vector and Clustering

Software Architecture Recovery includes the extraction of design patterns. Patterns may be found using many techniques such as fielder vectors, using clustering methods, query languages etc. In this chapter, for evaluating design patterns clustering methods and the general notion of fielder vector are used.


Introduction
A Software system is comprised of modules (B.W. Kernighan, S.Lin, 1970)( R.J. Wilson, J.J. Watkins, Graphs, 1990) (which includes procedures, files, functions etc.).In the beginning these modules should be classified into subsystems.For this purpose, construct a graph G= (V, E) such that each vertex is consists of the modules and each edge shows the relationship between these modules.After that we decide the classifications of the nodes into subsets, there by the cohesion between the nodes of class may be maximized and the coupling between the nodes of different classes be minimized.

Modules Identification
We cannot construct a graph, without identifying modules and relations.In this way the first step is to find the modules and its relations among them.
The following ways can be adopted to find modules 1) The easiest way is to treat each file as a module because the functions in the file are semantically related.
2) We can consider groups of files as a module.But here the question is which files should be grouped?
3) We can also consider procedure as a module, as we are following the easier approach i.e. considering each file as a module.

Relations
There will be three types of relations among files.
f1 Useproc f2: shows that there is one function in f1 which calls the other function in f2.
f1 Usevar f2:shows that there is one function in f1 which uses a variable defined in f2.
f1 Implementby f2:shows that there is one header function in f1 which is implemented in f2.
In this way, we can construct a graph, (B.W. Kernighan, S.Lin, 1970)( R.J. Wilson, J.J. Watkins, Graphs, 1990) using these modules and relations among modules.The next step is to classify these modules into subsystems.So as to divide the graph into sub graphs for using the concept of Fiedler vector.First this we have to know what a Fiedler Vector (M.Fiedler) is.
For the input graph G= (V, E) 1) calculate the Adjacency matrix 2) Calculate the Degree matrix 3) Calculate the Laplacian matrix The adjacency matrix is

=0 otherwise
The degree matrix of the graph is the diagonal matrix of the row sum of the adjacency matrix The Laplacian matrix L is the difference between diagonal matrix and Adjacency matrix (B.Mohar, 1997)

and (i , j) € E = 0 otherwise
Now the matrix available is symmetric.The Eigen vector (1, 1 ….1) T corresponds to trivial zero, Eigen values.With these Eigen values now, we get the Fiedler vector.We have to arrange first these Eigen values in the ascending order.The largest Eigen value and the second smallest Eigen value, whose corresponding Eigen vector is referred to as the Fiedler vector (M.Fiedler, 1975).
In this way Fiedler vector is known to us.Now decompose the graph into sub graphs.In this, the path sequence for the nodes has to be calculated using a permutation π.The sequence is that the elements of the edge weight matrix W decrease as the path is traversed if π (i) < π (j) < π (k) then W (i,j) > W (i,k) and W (j,k) > W (i,k) Consider a vector x -=( x 1 , x 2 ,……..x |v| ) of continuous variables x i .Calculate the penalty function The constraints of this function are

Decomposing a graph into sub graphs
By making use of Fiedler vector concept, the graph has to be divided into sub graphs (R.J. Wilson, J.J. Watkins, 1990).The neighbourhood of the node i consist of its center node together with its immediate neighbours connected by edges in the graph.

N i ^ = {i} U {u ;( i, u) € E}
Assign each node measure of significance as the center of the neighbourhood.After assigning, it traverses the path defined by the Fiedler vector.Select center nodes on the basis of this measure to traverse the path.Assign weights to nodes based on the rank-order in the permutation.The weight assigned to the node is i€ V is wi=Rank (π (i)).After assigning weight to each node calculate the score function.The significance of the node will be known after giving score to each node.Score can be calculated by using a function Where C 1 and C 2 are threshold values that were detected heuristically.P is the set of nodes on the perimeter of the graph.
The first term depends on the degree of the node and its proximity to the perimeter.In this way the nodes will be sorted according to their distance from the perimeter.This is proposed as it is better to decompose first from the outermost layer.The second term says that the first ranked nodes in the Fiedler vector are visited first.To locate the non overlapping neighbourhoods of the graph G, we use the scoring function.We traverse this list until we find a node K which is neither in the perimeter and also the calculated score should not exceed of its neighbours.If this condition gets satisfied then the node K together with its neighbours represents the first sub graph.This process will be repeated for all the nodes that satisfies the condition.Then we have to find out the sub graphs which are overlapping with its neighborhood.By doing this step the sub graphs are found.Thus Input the overlapping sub graphs to a clustering algorithm.There are two major approaches for subsystem classification.
In the Top down approach, creation of subsystem includes all modules and then iteratively decomposes the current subsystem to create them at lower levels.In the bottom up approach, consider each module as a subsystem and then iteratively merge them to create those at higher levels.
Top down approaches suffer from exponential complexity as in A* algorithm.So, follow the bottom up approach.
1) For clustering, calculate the similarity between two nodes.
2) Identify a set of nodes that are pair wise & most similar.After identifying, create a cluster by taking the union of the most similar cluster or creation of more than one cluster is also possible by taking the union of some of pairs of this set.

Similarity/Dissimilarity measures
Two nodes are said to be similar if they have either the highest similarity measure or lowest dissimilarity.
If the component of the system is entirely connected to just other component, that connection should be computed as a lower dissimilarity than any other connection that is not complete.It is based on the percentage degree of vertices & common neighbours of the two vertices.That is, let p be the dissimilarity matrix and is defined by where deg(i)is the degree of the vertex i in the graph and b(i,j) is the number of common neighbour of vertices i and j. since deg(i)+deg(j)-b(i,j) is the number of all vertices connected to exactly one of the i and j. note that if deg(i)=deg(j)=b(i,j) then p(i,j)=0 and so i and j are completely similar.Note also if i and j have no common neighbor then p(i,j)=1 and so i and j are completely dissimilar.
After clustering now there is a need to optimize the solution.Then consider the measurement of intra-connectivity and inter-connectivity.

Intra-connectivity
It is a measure of connectivity between the two components that are grouped together in the same cluster.The degree of intra-connectivity should be high for good sub system partitioning , because many software level features are shared by the modules grouped with in a common subsystem.
Where Ai is the intra-connectivity measurement N i is number of components m i intra-edge dependencies.

Inter-connectivity
It is a measure of connectivity between two distinct clusters.Inter connectivity should be very less.It is denoted by E ij i and j are clusters consisting of N i N j components.m ij is inter dependency.
The clusters will be derived.To apply the clustering techniques to software architecture recovery and reengineering, the object-attribute data matrix should be converted to object-object data matrix, so that the input reflects the interconnectivity of components.The clustering techniques are then used to minimize interconnections among components.Here we explain how the clustering technique could be used to support the identification of a pattern.
There are some client classes that are accessible to some subsystem classes.With the existing software architecture recovery assistants, especially file names based approaches; the result may look perfect for the subsystem.In other words, the architecture recovered through this type of technique is close to or the same as the modules that are partitioned by the designer.
Certainly, architecture capture is important and valuable.But we are also concerned with the ways to improve the architecture rather than simply capture it.Besides, in reality, the directory structures already often reveal the high level components of a system.Simply capturing software architecture at a higher level abstraction often has limited benefits.We get very different partitions by applying the clustering techniques to this example.In fact, the subsystem does not exist anymore, since many subsystem classes are directly accessed by or related to client classes.In other words, the clustering technique reveals that some classes in the subsystem are more closely related to client classes, which contradicts the design concept.Ideally, the subsystem classes should be grouped together as one unit.Clustering techniques could be used in this type of situations to enforce the architect to reason ways to keep the subsystem classes in a more cohesive manner.Facade pattern provides common interfaces to subsystem classes and facilitates separation of concerns.The subsystem classes in the new pattern-based design are grouped in the same unit according to the clustering method, which is consistent with the original design.In this example, the clustering technique helps the adoption of a design pattern to reduce the coupling between the subsystem and the clients.

Conclusion
With respect to graph matching there is exponential complexity, however we have proven the complexity is linear in certain situations not for all the problems.Due to this problem, to decompose the matching problem into subunits (smaller graphs).On this subunits investigate suing edit distance method and use the Fiedler matrix for the partition of graph.This process may be a hierarchical framework which is suitable for parallel computation.