9.6 Other Sorting Algorithms
As mentioned in the introduction to this chapter, there are many sorting algorithms, and we cannot explore them all in this chapter. However, in this section we briefly present two additional sorting algorithms that are important both practically and theoretically. Our discussion of these schemes will be brief. Refer to the bibliographic remarks (Section 9.7) for references on these and other algorithms.
9.6.1 Enumeration Sort
All the sorting algorithms presented so far are based on compare-exchange operations. This section considers an algorithm based on enumeration sort, which does not use compare-exchange. The basic idea behind enumeration sort is to determine the rank of each element. The rank of an element ai is the number of elements smaller than ai in the sequence to be sorted. The rank of ai can be used to place it in its correct position in the sorted sequence. Several parallel algorithms are based on enumeration sort. Here we present one such algorithm that is suited to the CRCW PRAM model. This formulation sorts n elements by using n2 processes in time Q(1).
Assume that concurrent writes to the same memory location of the CRCW PRAM result in the sum of all the values written being stored at that location (Section 2.4.1). Consider the n2 processes as being arranged in a two-dimensional grid. The algorithm consists of two steps. During the first step, each column j of processes computes the number of elements smaller than aj . During the second step, each process P1,j of the first row places aj in its proper position as determined by its rank. The algorithm is shown in Algorithm 9.7. It uses an auxiliary array C [1...n] to store the rank of each element. The crucial steps of this algorithm are lines 7 and 9. There, each process Pi,j, writes 1 in C [j] if the element A[i] is smaller than A[j] and writes 0 otherwise. Because of the additive-write conflict resolution scheme, the effect of these instructions is to count the number of elements smaller than A[j] and thus compute its rank. The run time of this algorithm is Q(1). Modifications of this algorithm for various parallel architectures are discussed in Problem 9.26.
1. procedure ENUM SORT (n) 2. begin 3. for each process P1,j do 4. C[j] :=0; 5. for each process Pi,j do 6. if (A[i] < A[j]) or ( A[i]= A[j] and i < j) then 7. C[j] := 1; 8. else 9. C[j] := 0; 10. for each process P1,j do 11. A[C[j]] := A[j]; 12. end ENUM_SORT
9.6.2 Radix Sort
The radix sort algorithm relies on the binary representation of the elements to be sorted. Let b be the number of bits in the binary representation of an element. The radix sort algorithm examines the elements to be sorted r bits at a time, where r < b. Radix sort requires b/r iterations. During iteration i , it sorts the elements according to their ith least significant block of r bits. For radix sort to work properly, each of the b/r sorts must be stable. A sorting algorithm is stable if its output preserves the order of input elements with the same value. Radix sort is stable if it preserves the input order of any two r -bit blocks when these blocks are equal. The most common implementation of the intermediate b/r radix-2r sorts uses enumeration sort (Section 9.6.1) because the range of possible values [0...2r - 1] is small. For such cases, enumeration sort significantly outperforms any comparison-based sorting algorithm.
Consider a parallel formulation of radix sort for n elements on a message-passing computer with n processes. The parallel radix sort algorithm is shown in Algorithm 9.8. The main loop of the algorithm (lines 3-17) performs the b/r enumeration sorts of the r-bit blocks. The enumeration sort is performed by using the prefix_sum() and parallel_sum() functions. These functions are similar to those described in Sections 4.1 and 4.3. During each iteration of the inner loop (lines 6-15), radix sort determines the position of the elements with an r-bit value of j . It does this by summing all the elements with the same value and then assigning them to processes. The variable rank holds the position of each element. At the end of the loop (line 16), each process sends its element to the appropriate process. Process labels determine the global order of sorted elements.
Algorithm 9.8 A parallel radix sort algorithm, in which each element of the array A[1...n] to be sorted is assigned to one process. The function prefix sum() computes the prefix sum of the flag variable, and the function parallel sum() returns the total sum of the flag variable.
1. procedure RADIX SORT(A, r) 2. begin 3. for i := 0 to b/r - 1 do 4. begin 5. offset := 0; 6. for j := 0 to 2r - 1 do 7. begin 8. flag := 0; 9. if the ith least significant r-bit block of A[Pk] = j then 10. flag := 1; 11. index := prefix_sum(flag) 12. if flag = 1 then 13. rank := offset + index; 14. offset := parallel_sum(flag); 15. endfor 16. each process Pk send its element A[Pk] to process Prank; 17. endfor 18. end RADIX_SORT
As shown in Sections 4.1 and 4.3, the complexity of both the parallel_sum() and prefix_sum() operations is Q(log n) on a message-passing computer with n processes. The complexity of the communication step on line 16 is Q(n). Thus, the parallel run time of this algorithm is