10.8 Bibliographic Remarks
Detailed discussions of graph theory and graph algorithms can be found in numerous texts. Gibbons [Gib85] provides a good reference to the algorithms presented in this chapter. Aho, Hopcroft, and Ullman [AHU74], and Cormen, Leiserson, and Rivest [CLR90] provide a detailed description of various graph algorithms and issues related to their efficient implementation on sequential computers.
The sequential minimum spanning tree algorithm described in Section 10.2 is due to Prim [Pri57]. Bentley [Ben80] and Deo and Yoo [DY81] present parallel formulations of Prim's MST algorithm. Deo and Yoo's algorithm is suited to a shared-address-space computer. It finds the MST in Q(n1.5) using Q(n0.5) processes. Bentley's algorithm works on a tree-connected systolic array and finds the MST in time Q(n log n) using n/log n processes. The hypercube formulation of Prim's MST algorithm in Section 10.2 is similar to Bentley's algorithm.
The MST of a graph can be also computed by using either Kruskal's [Kru56] or Sollin's [Sol77] sequential algorithms. The complexity of Sollin's algorithm (Problem 10.21) is Q(n2 log n). Savage and Jaja [SJ81] have developed a formulation of Sollin's algorithm for the CREW PRAM. Their algorithm uses n2 processes and solves the problem in time Q(log2 n). Chin, Lam, and Chen [CLC82] have developed a formulation of Sollin's algorithm for a CREW PRAM that uses processes and finds the MST in time Q(log2 n). Awerbuch and Shiloach [AS87] present a formulation of Sollin's algorithm for the shuffle-exchange network that uses Q(n2) processes and runs in time Q(log2 n). Doshi and Varman [DV87] present a Q(n2/p) time algorithm for a p-process ring-connected computer for Sollin's algorithm. Leighton [Lei83] and Nath, Maheshwari, and Bhatt [NMB83] present parallel formulations of Sollin's algorithm for a mesh of trees network. The first algorithm runs in Q(log2 n) and the second algorithm runs in Q(log4 n) for an n x n mesh of trees. Huang [Hua85] describes a formulation of Sollin's algorithm that runs in Q(n2/p) on a mesh of trees.
The single-source shortest paths algorithm in Section 10.3 was discovered by Dijkstra [Dij59]. Due to the similarity between Dijkstra's algorithm and Prim's MST algorithm, all the parallel formulations of Prim's algorithm discussed in the previous paragraph can also be applied to the single-source shortest paths problem. Bellman [Bel58] and Ford [FR62] independently developed a single-source shortest paths algorithm that operates on graphs with negative weights but without negative-weight cycles. The Bellman-Ford single-source algorithm has a sequential complexity of O (|V||E|). Paige and Kruskal [PK89] present parallel formulations of both the Dijkstra and Bellman-Ford single-source shortest paths algorithm. Their formulation of Dijkstra's algorithm runs on an EREW PRAM of Q(n) processes and runs in time Q(n log n). Their formulation of Bellman-Ford's algorithm runs in time Q(n|E|/p + n log p) on a p-process EREW PRAM where p |E|. They also present algorithms for the CRCW PRAM [PK89].
Significant work has been done on the all-pairs shortest paths problem. The source-partitioning formulation of Dijkstra's all-pairs shortest paths is discussed by Jenq and Sahni [JS87] and Kumar and Singh [KS91b]. The source parallel formulation of Dijkstra's all-pairs shortest paths algorithm is discussed by Paige and Kruskal [PK89] and Kumar and Singh [KS91b]. The Floyd's all-pairs shortest paths algorithm discussed in Section 10.4.2 is due to Floyd [Flo62]. The 1-D and 2-D block mappings (Problem 10.8) are presented by Jenq and Sahni [JS87], and the pipelined version of Floyd's algorithm is presented by Bertsekas and Tsitsiklis [BT89] and Kumar and Singh [KS91b]. Kumar and Singh [KS91b] present isoefficiency analysis and performance comparison of different parallel formulations for the all-pairs shortest paths on hypercube- and mesh-connected computers. The discussion in Section 10.4.3 is based upon the work of Kumar and Singh [KS91b] and of Jenq and Sahni [JS87]. In particular, Algorithm 10.4 is adopted from the paper by Jenq and Sahni [JS87]. Levitt and Kautz [LK72] present a formulation of Floyd's algorithm for two-dimensional cellular arrays that uses n2 processes and runs in time Q(n). Deo, Pank, and Lord have developed a parallel formulation of Floyd's algorithm for the CREW PRAM model that has complexity Q(n) on n2 processes. Chandy and Misra [CM82] present a distributed all-pairs shortest-path algorithm based on diffusing computation.
The connected-components algorithm discussed in Section 10.6 was discovered by Woo and Sahni [WS89]. Cormen, Leiserson, and Rivest [CLR90] discusses ways to efficiently implement disjoint-set data structures with ranking and path compression. Several algorithms exist for computing the connected components; many of them are based on the technique of vertex collapsing, similar to Sollin's algorithm for the minimum spanning tree. Most of the parallel formulations of Sollin's algorithm can also find the connected components. Hirschberg [Hir76] and Hirschberg, Chandra, and Sarwate [HCS79] developed formulations of the connected-components algorithm based on vertex collapsing. The former has a complexity of Q(log2 n) on a CREW PRAM with n2 processes, and the latter has similar complexity and uses processes. Chin, Lam, and Chen [CLC81] made the vertex collapse algorithm more efficient by reducing the number of processes to for a CREW PRAM, while keeping the run time at Q(log2 n). Nassimi and Sahni [NS80] used the vertex collapsing technique to develop a formulation for a mesh-connected computer that finds the connected components in time Q(n) by using n2 processes.
The single-source shortest paths algorithm for sparse graphs, discussed in Section 10.7.2, was discovered by Johnson [Joh77]. Paige and Kruskal [PK89] discuss the possibility of maintaining the queue Q in parallel. Rao and Kumar [RK88a] presented techniques to perform concurrent insertions and deletions in a priority queue. The 2-D block mapping, 2-D block-cyclic mapping, and 1-D block mapping formulation of Johnson's algorithm (Section 10.7.2) are due to Wada and Ichiyoshi [WI89]. They also presented theoretical and experimental evaluation of these schemes on a mesh-connected parallel computer.
The serial maximal independent set algorithm described in Section 10.7.1 was developed by Luby [Lub86] and its parallel formulation on shared-address-space architectures was motivated by the algorithm described by Karypis and Kumar [KK99]. Jones and Plassman [JP93] have developed an asynchronous variation of Luby's algorithm that is particularly suited for distributed memory parallel computers. In their algorithm, each vertex is assigned a single random number, and after a communication step, each vertex determines the number of its adjacent vertices that have smaller and greater random numbers. At this point each vertex gets into a loop waiting to receive the color values of its adjacent vertices that have smaller random numbers. Once all these colors have been received, the vertex selects a consistent color, and sends it to all of its adjacent vertices with greater random numbers. The algorithm terminates when all vertices have been colored. Note that besides the initial communication step to determine the number of smaller and greater adjacent vertices, this algorithm proceeds asynchronously.
Other parallel graph algorithms have been proposed. Shiloach and Vishkin [SV82] presented an algorithm for finding the maximum flow in a directed flow network with n vertices that runs in time O (n2 log n) on an n-process EREW PRAM. Goldberg and Tarjan [GT88] presented a different maximum-flow algorithm that runs in time O (n2 log n) on an n-process EREW PRAM but requires less space. Atallah and Kosaraju [AK84] proposed a number of algorithms for a mesh-connected parallel computer. The algorithms they considered are for finding the bridges and articulation points of an undirected graph, finding the length of the shortest cycle, finding an MST, finding the cyclic index, and testing if a graph is bipartite. Tarjan and Vishkin [TV85] presented algorithms for computing the biconnected components of a graph. Their CRCW PRAM formulation runs in time Q(log n) by using Q(|E| + |V|) processes, and their CREW PRAM formulation runs in time Q(log2 n) by using Q(n2/log2 n) processes.