12.7 Bibliographic Remarks
Dynamic programming was originally presented by Bellman [Bel57] for solving multistage decision problems. Various formal models have since been developed for DP [KH67, MM73, KK88b]. Several textbooks and articles present sequential DP formulations of the longest-common-subsequence problem, the matrix chain multiplication problem, the 0/1 knapsack problem, and the shortest-path problem [CLR90, HS78, PS82, Bro79].
Li and Wah [LW85, WL88] show that monadic serial DP formulations can be solved in parallel on systolic arrays as matrix-vector products. They further present a more concurrent but non-cost-optimal formulation by formulating the problem as a matrix-matrix product. Ranka and Sahni [RS90b] present a polyadic serial formulation for the string editing problem and derive a parallel formulation based on a checkerboard partitioning.
The DP formulation of a large class of optimization problems is similar to that of the optimal matrix-parenthesization problem. Some examples of these problems are optimal triangularization of polygons, optimal binary search trees [CLR90], and CYK parsing [AU72]. The serial complexity of the standard DP formulation for all these problems is Q(n3). Several parallel formulations have been proposed by Ibarra et al. [IPS91] that use Q(n) processing elements on a hypercube and that solve the problem in time Q(n2). Guibas, Kung, and Thompson [GKT79] present a systolic algorithm that uses Q(n2) processing cells and solves the problem in time Q(n). Karypis and Kumar [KK93] analyze three distinct mappings of the systolic algorithm presented by Guibas et al. [GKT79] and experimentally evaluate them by using the matrix-multiplication parenthesization problem. They show that a straightforward mapping of this algorithm to a mesh architecture has an upper bound on efficiency of 1/12. They also present a better mapping without this drawback, and show near-linear speedup on a mesh embedded into a 256-processor hypercube for the optimal matrix-parenthesization problem.
Many faster parallel algorithms for solving the parenthesization problem have been proposed, but they are not cost-optimal and are applicable only to theoretical models such as the PRAM. For example, a generalized method for parallelizing such programs is described by Valiant et al. [VSBR83] that leads directly to formulations that run in time O (log2 n) on O (n9) processing elements. Rytter [Ryt88] uses the parallel pebble game on trees to reduce the number of processing elements to O (n6/log n) for a CREW PRAM and O (n6) for a hypercube, yet solves this problem in time O (log2 n). Huang et al. [HLV90] present a similar algorithm for CREW PRAM models that run in time O ( log n) on O (n3.5 log n) processing elements. DeMello et al. [DCG90] use vectorized formulations of DP for the Cray to solve optimal control problems.
As we have seen, the serial polyadic formulation of the 0/1 knapsack problem is difficult to parallelize due to lack of communication locality. Lee et al. [LSS88] use specific characteristics of the knapsack problem and derive a divide-and-conquer strategy for parallelizing the DP algorithm for the 0/1 knapsack problem on a MIMD message-passing computer (Problem 12.2). Lee et al. demonstrate experimentally that it is possible to obtain linear speedup for large instances of the problem on a hypercube.