## 12.5 Nonserial Polyadic DP FormulationsIn nonserial polyadic DP formulations, in addition to processing subproblems at a level in parallel, computation can also be pipelined to increase efficiency. We illustrate this with the optimal matrix-parenthesization problem. ## 12.5.1 The Optimal Matrix-Parenthesization ProblemConsider the problem of multiplying n matrices, A Example 12.3 Optimal matrix parenthesization Consider three matrices A The objective of the parenthesization problem is to determine a parenthesization that minimizes the number of operations. Enumerating all possible parenthesizations is not feasible since there are exponentially many of them. Let C [i, j] be the optimal cost of multiplying the matrices A Given Equation 12.7, the problem reduces to finding the value of C [1, n]. The composition of costs of matrix chains is shown in Figure 12.7. Equation 12.7 can be solved if we use a bottom-up approach for constructing the table C that stores the values C [i, j]. The algorithm fills table C in an order corresponding to solving the parenthesization problem on matrix chains of increasing length. Visualize this by thinking of filling in the table diagonally (Figure 12.8). Entries in diagonal l corresponds to the cost of multiplying matrix chains of length l + 1. From Equation 12.7, we can see that the value of C [i, j] is computed as min{C [i, k]+C [k +1, j]+r ## Figure 12.7. A nonserial polyadic DP formulation for finding an optimal matrix parenthesization for a chain of four matrices. A square node represents the optimal cost of multiplying a matrix chain. A circle node represents a possible parenthesization.## Figure 12.8. The diagonal order of computation for the optimal matrix-parenthesization problem.In computing the cost of the optimal parenthesization sequence, the algorithm computes (n - 1) chains of length two. This takes time (n - 1)t The sequential complexity of the algorithm is Q(n Consider the parallel formulation of this algorithm on a logical ring of n processing elements. In step l, each processing element computes a single element belonging to the l The parallel run time of this algorithm is Q(n When using p processing elements (1 p n) organized in a logical ring, if there are n nodes in a diagonal, each processing element stores n/p nodes. Each processing element computes the cost C [i, j] of the entries assigned to it. After computation, an all-to-all broadcast sends the solution costs of the subproblems for the most recently computed diagonal to all the other processing elements. Because each processing element has complete information about subproblem costs at preceding diagonals, no other communication is required. The time taken for all-to-all broadcast of n/p words is t In order terms, T This formulation can use at most Q(n) processing elements to accomplish the task in time Q(n |