12.1 Consider the parallel algorithm for solving the 0/1 knapsack problem in Section 12.2.2. Derive the speedup and efficiency for this algorithm. Show that the efficiency of this algorithm cannot be increased beyond a certain value by increasing the problem size for a fixed number of processing elements. What is the upper bound on efficiency for this formulation as a function of t_{w} and t_{c}?

12.2 [LSS88] In the parallel formulation of the 0/1 knapsack problem presented in Section 12.2.2, the degree of concurrency is proportional to c, the knapsack capacity. Also this algorithm has limited data locality, as the amount of data to be communicated is of the same order of magnitude as the computation at each processing element. Lee et al. present another formulation in which the degree of concurrency is proportional to n, the number of weights. This formulation also has much more data locality. In this formulation, the set of weights is partitioned among processing elements. Each processing element computes the maximum profit it can achieve from its local weights for knapsacks of various sizes up to c. This information is expressed as lists that are merged to yield the global solution. Compute the parallel run time, speedup, and efficiency of this formulation. Compare the performance of this algorithm with that in Section 12.2.2.

12.3 We noticed that the parallel formulation of the longest-common-subsequence problem has an upper bound of 0.5 on its efficiency. It is possible to use an alternate mapping to achieve higher efficiency for this problem. Derive a formulation that does not suffer from this upper bound, and give the run time of this formulation.

Hint: Consider the blocked-cyclic mapping discussed in Section 3.4.1.

12.4 [HS78] The traveling salesman problem (TSP) is defined as follows: Given a set of cities and the distance between each pair of cities, determine a tour through all cities of minimum length. A tour of all cities is a trip visiting each city once and returning to the starting point. Its length is the sum of distances traveled. This problem can be solved using a DP formulation. View the cities as vertices in a graph G(V, E). Let the set of cities V be represented by {v_{1}, v_{2}, ..., v_{n}} and let S {v_{2}, v_{3}, ..., v_{n}}. Furthermore, let c_{i}_{,}_{j} be the distance between cities i and j. If f (S, k) represents the cost of starting at city v_{1}, passing through all the cities in set S, and terminating in city k, then the following recursive equations can be used to compute f (S, k):

**
Equation 12.9**

Based on Equation 12.9, derive a parallel formulation. Compute the parallel run time and the speedup. Is this parallel formulation cost-optimal?

12.5 [HS78] Consider the problem of merging two sorted files containing O (n) and O (m) records. These files can be merged into a sorted file in time O (m + n). Given r such files, the problem of merging them into a single file can be formulated as a sequence of merge operations performed on pairs of files. The overall cost of the merge operation is a function of the sequence in which they are merged. The optimal merge order can be formulated as a greedy problem and its parallel formulations derived using principles illustrated in this chapter.

Write down the recursive equations for this problem. Derive a parallel formulation for merging files using p processing elements. Compute the parallel run time and speedup, and determine whether your parallel formulation is cost-optimal.

12.6 [HS78] Consider the problem of designing a fault-tolerant circuit containing n devices connected in series, as shown in Figure 12.9(a). If the probability of failure of each of these devices is given by f_{i}, the overall probability of failure of the circuit is given by . Here, represents a product of specified terms. The reliability of this circuit can be improved by connecting multiple functional devices in parallel at each stage, as shown in Figure 12.9(b). If stage i in the circuit has r_{i} duplicate functional units, each with a probability of failure given by f_{i} , then the overall probability of failure of this stage is reduced to and the overall probability of failure of the circuit is given by . In general, for physical reasons, the probability of failure at a particular level may not be , but some function i (r_{i}, m_{i}). The objective of the problem is to minimize the overall probability of failure of the circuit, .

Construction cost adds a new dimension to this problem. If each of the functional units used at stage i costs c_{i} then due to cost constraints, the overall cost should be less than a fixed quantity c.

The problem can be formally defined as

where m_{i} > 0 and 0 < i n.

Let f_{i} (x) represent the reliability of a system with i stages of cost x. The optimal solution is given by f_{n} (c). The recursive equation for f_{i} (x) is as follows:

**
Equation 12.10**

Classify this formulation into one of the four DP categories, and derive a parallel formulation for this algorithm. Determine its parallel run time, speedup, and isoefficiency function.

12.7 [CLR90] Consider the simplified optimal polygon-triangulation problem. This problem can be defined as follows. Given a simple polygon, break the polygon into a set of triangles by connecting nodes of the polygon with chords. This process is illustrated in Figure 12.10. The cost of constructing a triangle with nodes v_{i}, v_{j}, and v_{k} is defined by a function f (v_{i}, v_{j}, v_{k}). For this problem, let the cost be the total length of the edges of the triangle (using Euclidean distance). The optimal polygon-triangulation problem breaks up a polygon into a set of triangles such that the total length of each triangle (the sum of the individual lengths) is minimized. Give a DP formulation for this problem. Classify it into one of the four categories and derive a parallel formulation for p processing elements. Determine its parallel run time, speedup, and isoefficiency function.

Hint: This problem is similar to the optimal matrix-parenthesization problem.