## 9.7 Bibliographic RemarksKnuth [Knu73] discusses sorting networks and their history. The question of whether a sorting network could sort n elements in time O (log n) remained open for a long time. In 1983, Ajtai, Komlos, and Szemeredi [AKS83] discovered a sorting network that could sort n elements in time O (log n) by using O (n log n) comparators. Unfortunately, the constant of their sorting network is quite large (many thousands), and thus is not practical. The bitonic sorting network was discovered by Batcher [Bat68], who also discovered the network for odd-even sort. These were the first networks capable of sorting n elements in time O (log The odd-even transposition sort is described by Knuth [Knu73]. Several early references to parallel sorting by odd-even transposition are given by Knuth [Knu73] and Kung [Kun80]. The block-based extension of the algorithm is due to Baudet and Stevenson [BS78]. Another variation of block-based odd-even transposition sort that uses bitonic merge-split is described by DeWitt, Friedland, Hsiao, and Menon [DFHM82]. Their algorithm uses p processes and runs in time O (n + n log(n/p)). In contrast to the algorithm of Baudet and Stevenson [BS78], which is faster but requires 4n/p storage locations in each process, the algorithm of DeWitt et al. requires only (n/p) + 1 storage locations to perform the compare-split operation. The shellsort algorithm described in Section 9.3.2 is due to Fox et al. [FJL The sequential quicksort algorithm is due to Hoare [Hoa62]. Sedgewick [Sed78] provides a good reference on the details of the implementation and how they affect its performance. The random pivot-selection scheme is described and analyzed by Robin [Rob75]. The algorithm for sequence partitioning on a single process was suggested by Sedgewick [Sed78] and used in parallel formulations by Raskin [Ras78], Deminet [Dem82], and Quinn [Qui88]. The CRCW PRAM algorithm (Section 9.4.2) is due to Chlebus and Vrto [CV91]. Many other quicksort-based algorithms for PRAM and shared-address-space parallel computers have been developed that can sort n elements in time Q(log n) by using Q(n) processes. Martel and Gusfield [MG89] developed a quicksort algorithm for a CRCW PRAM that requires space O (n The sequential bucket sort algorithm was first proposed by Isaac and Singleton in 1956. Hirschberg [Hir78] proposed a bucket sort algorithm for the EREW PRAM model. This algorithm sorts n elements in the range [0...n - 1] in time Q(log n) by using n processes. A side effect of this algorithm is that duplicate elements are eliminated. Their algorithm requires space Q(n The sequential sample sort algorithm was discovered by Frazer and McKellar [FM70]. The parallel sample sort algorithm (Section 9.5) was discovered by Shi and Schaeffer [SS90]. Several parallel formulations of sample sort for different parallel architectures have been proposed. Abali, Ozguner, and Bataineh [AOB93] presented a splitter selection scheme that guarantees that the number of elements ending up in each bucket is n/p. Their algorithm requires time O ((n log n)/p + p log Many other parallel sorting algorithms have been proposed. Various parallel sorting algorithms can be efficiently implemented on a PRAM model or on shared-address-space computers. Akl [Akl85], Borodin and Hopcroft [BH82], Shiloach and Vishkin [SV81], and Bitton, DeWitt, Hsiao, and Menon [BDHM84] provide a good survey of the subject. Valiant [Val75] proposed a sorting algorithm for a shared-address-space SIMD computer that sorts by merging. It sorts n elements in time O (log n log log n) by using n/2 processes. Reischuk [Rei81] was the first to develop an algorithm that sorted n elements in time Q(log n) for an n-process PRAM. Cole [Col88] developed a parallel merge-sort algorithm that sorts n elements in time Q(log n) on an EREW PRAM. Natvig [Nat90] has shown that the constants hidden behind the asymptotic notation are very large. In fact, the Q(log |