Modeling the Parallelization of the Edmonds-Karp Algorithm and Application

Many optimization problems can be reduced to the maximum flow problem in a network. However, the maximum flow problem is equivalent to the problem of the minimum cut, as shown by Fulkerson and Ford (Fulkerson & Ford, 1956). There are several algorithms of the graph’s cut, such as the Ford-Fulkerson algorithm (Ford & Fulkerson, 1962), the Edmonds-Karp algorithm (Edmonds & Karp, 1972) or the Goldberg-Tarjan algorithm (Goldberg & Tarjan, 1988). In this paper, we study the parallel computation of the Edmonds-Karp algorithm which is an optimized version of the Ford-Fulkerson algorithm. Indeed, this algorithm, when executed on large graphs, can be extremely slow, with a complexity of the order of O|V|.|E| (where |V| designates the number of vertices and |E| designates the number of the edges of the graph). So why we are studying its implementation on inexpensive parallel platforms such as OpenMp and GP-GPU. Our work also highlights the limits for parallel computing on this algorithm.


Introduction
In this paper, we study the parallelization of the Edmonds-Karp algorithm. Originally, this algorithm was proposed by Ford-Fulkerson (Ford & Fulkerson, 1962), but implemented by Edmonds-Karp (Edmonds & Karp, 1972). A formal approach of this algorithm was also proposed by Lammich P. and Sefidgar S. R. (2016). This algorithm consists of finding paths from the source vertex S to the terminal vertex T, with a capacity available on all their arcs. The process stops when all possible paths with nonzero capability are exhausted. This algorithm, as simple as it looks, is very slow when run on large graphs. Thus, different versions have been proposed for parallel architectures, to obtain better performances. Edmonds and Karp had already proposed an algorithm parallelization and optimization approach for shared memory systems. Other parallel versions of this algorithm have also been developed (Karger & Stein, 1996;Vineet & Narayanan, 2008). In this work, we analyze the complexity of the parallelization of this algorithm, then we propose a parallel solution approach that we codify and test. maximum flow.

Example
As an illustration, the network of figure 1 consisting of 7 nodes (Wikipedia, 2014) can represent an electrical circuit.
Description: On the edges are represented the pairs f/c where f is the flow of current and c is the capacity. The residual capacity of u to v is CF(u, v)=c(u, v)-f(u, v). It is equal to the total capacity less the flow already used. If the net flow from u to v is negative, it contributes to the residual capacity. When Edmonds-Karp algorithm is applied on this network, the maximum flow rate is 5.

Minimum Cut
A cut is a set of vertices, and the cardinal of the cut is the number of edges having one end inside this set and the other outside (see figure 2). A cut is minimum if its cardinal is minimum.
The problem of the minimum cut (Min-Cut) is equivalent to the maximum flow problem, according to the flow-max/min-cut theorem (Fulkerson & Ford, 1956) that we recall below.
Theorem 1 In optimization, the flow-max / min-cut theorem states that given a flow graph, the maximum flow that can go from the source to the destination is equal to the minimum capacity that must be removed from the graph in order to prevent that any flow cannot go from the source to the destination.

Case of a Complete Undirected and Unweighted Graph
Complete graph: A complete graph G is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge.
Let N be the number of nodes of a undirected complete graph representing a network. The weight (or flow) in each edge is assumed equal to 1. If CN denotes the minimum cut of this graph, we obtain the explicit relations in figure 3. Let be any partition of G N , in two sets of x elements and N-x elements. The number of edges common to x elements is 2 and the number of edges common to (N-x) other elements is − 2 . Hence the number of edges that have a source and a destination in the two sets that form the partition of G N is given in equation (1).
Thus the values of the cut are given by the function This function is strictly increasing for x [1, ( The maximum values of C are:{ The minimum values are reached for x=1 and x=N-1 which correspond to the limits of its domain of definition. C(1)=C(N-1)=N-1. Hence the result.

Algorithms Used
In this section, we assume: The Edmonds-Karp algorithm (Edmonds & Karp, 1972) is an implementation of the Ford-Fulkerson method for calculating the maximum flow in a flow network, with a time of order O|V|.|E| 2 .
The difference between the two algorithms is that in Edmonds-Karp algorithm (see algorithm 1 for description) the search is done in an orderly way: a search criterion of the path that improves the solution is defined. The solution path must be as short as possible (number of edges) and with a non-zero capacity of the flow. This path can be found with the Breadth-first search (BFS) algorithm, assuming all edges are of unit capacity.
While G f contains a path from the source S to the terminal T do Let C be that path with a minimum of edges

Principle of the algorithm
The Edmonds-Karp algorithm is based on 3 steps: 1. Search for an unexplored path from source S to terminal T. This step is performed using the Breadth-First Search (BFS) algorithm (algorithm 2) which ensures that the next path from S to T is minimal in number of edges. 2. Find the maximum flow that can be send on this path. This flow noted e is given by the equation (3).
3. Send the flow e on this path.
These operations are repeated until there is no augmenting (Note 2) path. The maximum flow is equal to the sum of flows coming from distinct paths from S to T. This can be described algorithmically as follows: • Initialize the flow f to 0 value on each edge.
• while there is a path from S to T unsaturated, increase the flow with the residual capacity e calculated until having a complete flow (all paths are saturated); • the maximum flow value of the Network is | f | =∑e.

Breadth First Search (BFS) Algorithm
This algorithm allows to browse a graph as follows: it starts by exploring a source node, then its successors, then the unexplored successors of successors, etc. Nodes already visited are marked to prevent the same node from being explored more than once. It can be summarized in the following steps: 1. Put the source node in the queue (Q). 2. Remove the source node from the queue to process it. 3. Put all unexplored neighbors in the queue. 4. If the queue is not empty, return to step 2.

Proposed Parallel Computation Approach
As we saw above, the Edmonds-Karp algorithm consists of three main steps: 1. The determination of the augmenting path. This step is performed sequentially. The paths are generated one by one, to avoid their duplication. On the other hand, the operation search for unvisited neighbors of a vertex will be performed in parallel. 2. The calculation of the residual capacity e. This operation is performed in parallel. Each computing unit (core) will process a group of edges assigned to it. The unit calculates the residual capacities of these arcs and then determines the relative minimum e i . Once these relative minima are calculated, the global minimum e is obtained by a reduction operation in parallel. 3. Updating the values of the edges. The previous step permit to calculate the minimum overall residual capacity e. It remains now to update, the values of the edges diminished by e. This operation is a classical, depending only to each edge. It can also be done in parallel, by distributing the edges to the different cores.
These three steps are repeated in a main loop which stop condition is "no augmenting path".
In Figure 4 each computing unit (CU) determines the local residual capacity of its assigned edges. After determining the global residual capacity, the CU updates their residual capacities.

General Principle of the Solution
The main loop of the Edmonds-Karp algorithm ends when there is no augmenting path unexplored from S to T. The Boolean function BFS (given in algorithm 2) performs this test. Where there is a path to explore, our approach is to create a list of vertices that make up that path and another one for their predecessors. The tasks of calculating the residual capacity of the path and the update of the values of the flows are parallelizable because they concern only this list of vertices.

Complexity of the Algorithm
We assume: |V|=N: number of nodes of the graph |E|=M: number of edges of the graph C max : maximum capacity of the edges We use two methods for estimating the complexity of the Edmonds-karp algorithm.

st method
The maximum number of paths from S to T is in the order of O(N).
Assuming that at each iteration, the flow increases by one, there are at most O(M.C max ) improvements. Thus, the total number of operations for the Edmonds-Karp algorithm is O(NM.C max )=O(NM 2 ).

nd method
The search for a path from the source S to the terminal T, with a minimum of edges using the algorithm BFS Each iteration of the "while" loop identifies an increasing path, and the increase can only be stopped when there is zero unsaturated edge. There is therefore at most M*N iterations.
In conclusion, the complexity of the Edmonds-Karp algorithm is of the order of O (M 2 N) identical to the 1 st method.
The simulations carried out concern networks ranging in size from 100 vertices (3323 edges) to 1000 vertices (499500 edges).

Principle of the solution
The use of the OpenMP directives concerns only two steps of the algorithm, trivially parallelizable: the calculation of the residual capacities and the update of the values of the edges.
During the search phase of the residual capacity, each thread calculates a local residual capacity in relation to the nodes that it processes then in a critical area "#pragma omp critical" these threads contribute to the determination of the overall residual capacity.

Modeling of treatment time
In addition to the assumptions made in paragraph 3.2, we assume: α: OpenMP initialization time; β: Latency or delay for all threads to finish their tasks; T e : execution time of a sequential iteration; T it : execution time of a OpenMP iteration.
From the assumptions made in section 3.2, we deduce the sequential execution time (T Seq ) of the computation of the maximal flow at relation (4).
The relation (6) shows that the acceleration of this algorithm for very high values of M is majored to 3

Implementation on GPGPU
To port the Edmonds-Karp algorithm on GP-GPU (General-purpose Processing on Graphics Processing Units), we have as well as when it comes to the OpenMP, identify the parallelizable zoned: the calculation for the search for the minimum residual capacity and the update of edges capabilities. This is what we will discuss below.

Principle of the solution
The adjacency matrix is represented in vector form. Then we choose enough block sizes so that parallel calculations can be run at the same time. The calculation of the minimum capacity is done by a conventional reduction operation and the updates are done in parallel by the calculation units of the GPU.

Estimation of theoretical acceleration
Thus, the theoretical acceleration achieved is: Note that the acceleration is majored by 3.

Optimal value of Nwp
T = (3λ + (M + lb(Nwp) + 1) )M By deriving the expression of T with respect to Nwp, we obtain: (2) ⇒no value of Nwp cancels the derivative. T is strictly monotonous and increasing because ln(2) > 0.

Results
After having coded the Edmonds-Karp algorithm in OpenMP then in Cuda C, we carry out some tests to measure the impact of our approach.

Code of the Algorithm Used
For our simulations, we took an optimized implementation of the Edmonds-Karp algorithm, written in C++. The graph representation used is the adjacency matrix. Graphs can be real-world representations such as road networks, networks for distribution of water, electricity, gas, etc. The Edmonds-Karp algorithm comes in various versions depending on the programming language, particularly because of its practical use. We have made changes to highlight the parallel parts of the code without impacting its initial effectiveness.

Basic Data
The data used to test the different codes come from various sources such as:  9th DIMACS Implementation Challenge website (DIMACS, 2015);  GTgraph, a program generating random graphs of type (n, m) (Bader & Madduri, 2006);  GTgraph but of the type RMAT which uses a law of probability for the generation of the graphs;  RandomGraph: RandomGraph[m,n] returns a random pseudo-graph with n vertices and m edges;  or with Graph Generator -v3.1  Description: Table 2 shows datas recorded on OpenMp platforms and table 3 shows GP-GPU datas.

Remarks
 For OpenMP, the acceleration is quasi-stationary from 4 threads.  For both cases (OpenMP and GP-GPU), the acceleration increases with the density of the graph.  In our study, the GP-GPU presents a better performance than the OpenMP.

Conclusions
In this paper, we studied the possibilities of parallelization of the Edmonds-Karp algorithm and modeled the computation times of its sequential and parallel versions. We have estimated the theoretical acceleration then we proposed the corresponding parallel codes on inexpensive architectures: OpenMP and GPU. In both cases, the basic idea is to treat the nodes of the network in parallel: • when looking for the maximum flow that can cross a path; • when updating the residual capacities of the nodes.
The tests we have done on these parallel versions show that they improve the calculation time of the algorithm compared to its sequential version. The acceleration ranges from 1.4 to 2.2 for the OpenMP and from 1.3 to 2.6 for the GP-GPU. This variation is correlated with the average degree of the graph. The acceleration is also bounded by 3 because of the limitation of the parallelizable zone.