## Bibliography[ABJ82] S. G. Akl, D. T. Bernard, and R. J. Jordan. Design and implementation of a parallel tree search algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-4:192-203, 1982. [ACM91] ACM. Resources in Parallel and Concurrent Systems. ACM Press, New York, NY, 1991. [ACS89a] A. Aggarwal, A. K. Chandra, and M. Snir. A model for hierarchical memory. Technical Report RC 15118 (No. 67337), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989. [ACS89b] A. Aggarwal, A. K. Chandra, and M. Snir. On communication latency in PRAM computations. Technical Report RC 14973 (No. 66882), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989. [ACS89c] A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Technical Report RC 14998 (64644), IBM T. J. Watson Research Center, Yorktown Heights, NY, 1989. [ADJ [AFKW90] I. Angus, G. C. Fox, J. Kim, and D. W. Walker. Solving Problems on Concurrent Processors: Software for Concurrent Processors: Volume II. Prentice-Hall, Englewood Cliffs, NJ, 1990. [AG94] G. S. Almasi and A. Gottlieb. Highly Parallel Computing. Benjamin/Cummings, Redwood City, CA, 1994. (Second Edition). [Aga89] A. Agarwal. Performance tradeoffs in multithreaded processors. Technical Report 89-566, Massachusetts Institute of Technology, Microsystems Program Office, Cambridge, MA, 1989. [Aga91] A. Agarwal. Performance tradeoffs in multithreaded processors. Technical report MIT/LCS/TR 501; VLSI memo no. 89-566, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 1991. [AGGM90] A. Averbuch, E. Gabber, B. Gordissky, and Y. Medan. A parallel FFT on an MIMD machine. Parallel Computing, 15:61-74, 1990. [Agh86] G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, Cambridge, MA, 1986. [AHMP87] H. Alt, T. Hagerup, K. Mehlhorn, and F. P. Preparata. Deterministic simulation of idealized parallel computers on more realistic ones. SIAM Journal of Computing, 16(5):808-835, October 1987. [AHU74] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA, 1974. [AJM88] D. P. Agrawal, V. K. Janakiram, and R. Mehrotra. A randomized parallel branch-and-bound algorithm. In Proceedings of the 1988 International Conference on Parallel Processing, 1988. [AK84] M. J. Atallah and S. R. Kosaraju. Graph problems on a mesh-connected processor array. Journal of ACM, 31(3):649-667, July 1984. [Akl85] S. G. Akl. Parallel Sorting Algorithms. Academic Press, San Diego, CA, 1985. [Akl89] S. G. Akl. The Design and Analysis of Parallel Algorithms. Prentice-Hall, Englewood Cliffs, NJ, 1989. [Akl97] S. G. Akl. Parallel Computation Models and Methods. Prentice-Hall, Englewood Cliffs, NJ, 1997. [AKR89] S. Arvindam, V. Kumar, and V. N. Rao. Floorplan optimization on multiprocessors. In Proceedings of the 1989 International Conference on Computer Design, 1989. Also published as Technical Report ACT-OODS-241-89, Microelectronics and Computer Corporation, Austin, TX. [AKR90] S. Arvindam, V. Kumar, and V. N. Rao. Efficient parallel algorithms for search problems: Applications in VLSI CAD. In Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, 1990. [AKRS91] S. Arvindam, V. Kumar, V. N. Rao, and V. Singh. Automatic test pattern generation on multiprocessors. Parallel Computing, 17, number 12:1323-1342, December 1991. [AKS83] M. Ajtai, J. Komlos, and E. Szemeredi. An O (n log n) sorting network. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 1-9, 1983. [AL93] S. G. Akl and K. A. Lyons. Parallel Computational Geometry. Prentice-Hall, Englewood Cliffs, NJ, 1993. [AM88] T. S. Abdelrahman and T. N. Mudge. Parallel branch-and-bound algorithms on hypercube multiprocessors. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 1492-1499, New York, NY, 1988. ACM Press. [Amd67] G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference Proceedings, 483-485, 1967. [And91] G. R. Andrews. Concurrent Programming: Principles and Practice. Benjamin/Cummings, Redwood City, CA, 1991. [AOB93] B. Abali, F. Ozguner, and A. Bataineh. Balanced parallel sort on hypercube multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 4(5):572-581, May 1993. [AS87] B. Awerbuch and Y. Shiloach. New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Transactions on Computers, C- 36(10):1258-1263, October 1987. [AU72] A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation and Compiling: Volume 1, Parsing. Prentice-Hall, Englewood Cliffs, NJ, 1972. [B [BA82] M. Ben-Ari. Principles of Concurrent Programming. Prentice-Hall, Englewood Cliffs, NJ, 1982. [Bab88] R. G. Babb. Programming Parallel Processors. Addison-Wesley, Reading, MA, 1988. [Bai90] D. H. Bailey. FFTs in external or hierarchical memory. Journal of Supercomputing, 4:23-35, 1990. [Bar68] G. H. Barnes. The ILLIAC IV computer. IEEE Transactions on Computers, C-17(8):746-757, 1968. [Bat68] K. E. Batcher. Sorting networks and their applications. In Proceedings of the 1968 Spring Joint Computer Conference , 307-314, 1968. [Bat76] K. E. Batcher. The Flip network in STARAN. In Proceedings of International Conference on Parallel Processing, 65-71, 1976. [Bat80] K. E. Batcher. Design of a massively parallel processor. IEEE Transactions on Computers, 836-840, September 1980. [Bau78] G. M. Baudet. The Design and Analysis of Algorithms for Asynchronous Multiprocessors. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, 1978. [BB90] K. P. Belkhale and P. Banerjee. Approximate algorithms for the partitionable independent task scheduling problem. In Proceedings of the 1990 International Conference on Parallel Processing, I72-I75, 1990. [BBN89] BBN Advanced Computers Inc. TC-2000 Technical Product Summary. Cambridge, MA. 1989. [BCCL95] R. E. Bixby, W. Cook, A. Cox, and E. K. Lee. Parallel mixed integer programming. Technical Report CRPC TR 95554, Center for Research on Parallel Computation, Research Monograph, 1995. [BCJ90] E. C. Bronson, T. L. Casavant, and L. H. Jamieson. Experimental application-driven architecture analysis of an SIMD/MIMD parallel processing system. IEEE Transactions on Parallel and Distributed Systems, 1(2):195-205, 1990. [BDHM84] D. Bitton, D. J. DeWitt, D. K. Hsiao, and M. J. Menon. A taxonomy of parallel sorting. Computing Surveys, 16(3):287-318, September 1984. [Bel57] R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957. [Bel58] R. Bellman. On a routing problem. Quarterly of Applied Mathematics, 16(1):87-90, 1958. [Ben80] J. L. Bentley. A parallel algorithm for constructing minimum spanning trees. Journal of the ACM, 27(1):51-59, March 1980. [Ber84] S. Berkowitz. On computing the determinant in small parallel time using a small number of processors. Information Processing Letters, 18(3):147-150, March 1984. [Ber89] J. Berntsen. Communication efficient matrix multiplication on hypercubes. Parallel Computing, 12:335-342, 1989. [BH82] A. Borodin and J. E. Hopcroft. Routing merging and sorting on parallel models of computation. In Proceedings of the 14th Annual ACM Symposium on Theory of Computing, 338-344, May 1982. [Bix91] R. E. Bixby. Two applications of linear programming. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991. [BJK [BKH89] S. Bershader, T. Kraay, and J. Holland. The giant-Fourier-transform. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications: Volume I, 387-389, 1989. [Ble90] G. E. Blelloch. Vector Models for Data-Parallel Computing. MIT Press, Cambridge, MA, 1990. [BMCP98] A. Brungger, A. Marzetta, J. Clausen, and M. Perregaard. Solving large-scale qap problems in parallel with the search library zram. Journal of Parallel and Distributed Computing, 50:157-169, 1998. [BNK92] A. Bar-Noy and S. Kipnis. Designing broadcasting algorithms in the postal model for message-passing systems. In Proceedings of 4th ACM Symposium on Parallel Algorithms and Architectures, 13-22, 1992. [BOS [BR90] R. Boppana and C. S. Raghavendra. On optimal and practical routing methods for a massive data movement operation on hypercubes. Technical report, University of Southern California, Los Angeles, CA, 1990. [Bra97] R. Bramley. Technology news & reviews: Chemkin software; OpenMP Fortran Standard; ODE toolbox for Matlab; Java products; Scientific WorkPlace 3.0. IEEE Computational Science and Engineering, 4(4):75-78, October/December 1997. [Bro79] K. Brown. Dynamic programming in computer science. Technical Report CMU-CS-79-106, Carnegie Mellon University, Pittsburgh, PA, 1979. [BS78] G. M. Baudet and D. Stevenson. Optimal sorting algorithms for parallel computers. IEEE Transactions on Computers, C-27(1):84-87, January 1978. [BT89] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, NJ, 1989. [BT97] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientific, 1997. [But97] D. R. Butenhof. Programming with POSIX Threads. Addison-Wesley, Reading, MA, 1997. [Buy99] R. Buyya, editor. High Performance Cluster Computing: Architectures and Systems. Prentice Hall, 1999. [BW89] M. L. Barton and G. R. Withers. Computing performance as a function of the speed, quantity, and the cost of processors. In Supercomputing '89 Proceedings, 759-764, 1989. [BW97] J. Beveridge and R. Wiener. Multithreading Applications in Win32: the Complete Guide to Threads. Addison-Wesley Developers Press, Reading, MA, 1997. [C [CAHH91] N. P. Chrisopchoides, M. Aboelaze, E. N. Houstis, and C. E. Houstis. The parallelization of some level 2 and 3 BLAS operations on distributed-memory machines. In Proceedings of the First International Conference of the Austrian Center of Parallel Computation. Springer-Verlag Series Lecture Notes in Computer Science, 1991. [Can69] L. E. Cannon. A cellular computer to implement the Kalman Filter Algorithm. Ph.D. Thesis, Montana State University, Bozman, MT, 1969. [Car89] G. F. Carey, editor. Parallel Supercomputing: Methods, Algorithms and Applications. Wiley, New York, NY, 1989. [CD87] S. Chandran and L. S. Davis. An approach to parallel vision algorithms. In R. Porth, editor, Parallel Processing. SIAM, Philadelphia, PA, 1987. [CDK [CG87] E. Chu and A. George. Gaussian elimination with partial pivoting and load balancing on a multiprocessor. Parallel Computing, 5:65-74, 1987. [CGK93] D. Challou, M. Gini, and V. Kumar. Parallel search algorithms for robot motion planning. In Proceedings of the IEEE Conference on Robotics and Automation, 46-51, 1993. [CGL92] D. E. Culler, M. Gunter, and J. C. Lee. Analysis of multithreaded microprocessors under multiprogramming. Report UCB/CSD 92/687, University of California, Berkeley, Computer Science Division, Berkeley, CA, May 1992. [Cha79] A. K. Chandra. Maximal parallelism in matrix multiplication. Technical Report RC-6193, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1979. [Cha87] R. Chamberlain. An alternate view of LU factorization on a hypercube multiprocessor. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 569-575. SIAM, Philadelphia, PA, 1987. [CJP83] H. Crowder, E. L. Johnson, and M. Padberg. Solving large-scale zero-one linear programming problem. Operations Research, 2:803-834, 1983. [CKP [CKP [CL93] B. Codenotti and M. Leoncini. Introduction to Parallel Processing. Addison-Wesley, 1993. [CLC81] F. Y. Chin, J. Lam, and I. Chen. Optimal parallel algorithms for the connected component problem. In Proceedings of the 1981 International Conference on Parallel Processing, 170-175, 1981. [CLC82] F. Y. Chin, J. Lam, and I. Chen. Efficient parallel algorithms for some graph problems. Communications of the ACM, 25(9):659-665, September 1982. [CLR90] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, McGraw-Hill, New York, NY, 1990. [CM82] K. M. Chandy and J. Misra. Distributed computation on graphs: Shortest path algorithms. Communications of the ACM, 25(11):833-837, November 1982. [CM98] B. Chapman and P. Mehrotra. OpenMP and HPF: Integrating two paradigms. Lecture Notes in Computer Science, 1470, 1998. [Col88] R. Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770-785, August 1988. [Col89] M. Cole. Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA, 1989. [Con89] T. Conlon. Programming in PARLOG. Addison-Wesley, Reading, MA, 1989. [CR89] E. A. Carmona and M. D. Rice. A model of parallel performance. Technical Report AFWL-TR-89-01, Air Force Weapons Laboratory, 1989. [CR91] E. A. Carmona and M. D. Rice. Modeling the serial and parallel fractions of a parallel algorithm. Journal of Parallel and Distributed Computing, 1991. [CS88] B. V. Cherkassky and R. Smith. Efficient mapping and implementations of matrix algorithms on a hypercube. Journal of Supercomputing, 2:7-27, 1988. [CSG98] D. E. Culler, J. P. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1998. [CT92] K. M. Chandy and S. Taylor. An Introduction to Parallel Programming. Jones and Bartlett, Austin, TX, 1992. [CV91] B. S. Chlebus and I. Vrto. Parallel quicksort. Journal of Parallel and Distributed Processing, 1991. [Cve87] Z. Cvetanovic. Performance analysis of the FFT algorithm on a shared-memory parallel architecture. IBM Journal of Research and Development, 31(4):435-451, 1987. [CWP98] A. Cohen, M. Woodring, and R. Petrusha. Win32 Multithreaded Programming. O'Reilly & Associates, 1998. [D [Dal87] W. J. Dally. A VLSI Architecture for Concurrent Data Structures. Kluwer Academic Publishers, Boston, MA, 1987. [Dal90a] W. J. Dally. Analysis of k-ary n-cube interconnection networks. IEEE Transactions on Computers, 39(6), June 1990. [Dal90b] W. J. Dally. Network and processor architecture for message-driven computers. In R. Sauya and G. Birtwistle, editors, VLSI and Parallel Computation. Morgan Kaufmann, San Mateo, CA, 1990. [Dav86] G. J. Davis. Column LU factorization with pivoting on a hypercube multiprocessor. SIAM Journal on Algebraic and Discrete Methods, 7:538-550, 1986. Also available as Technical Report ORNL-6219, Oak Ridge National Laboratory, Oak Ridge, TN, 1985. [DCG90] J. D. DeMello, J. L. Calvet, and J. M. Garcia. Vectorization and multitasking of dynamic programming in control: experiments on a CRAY-2. Parallel Computing, 13:261-269, 1990. [DDSV99] J. Dongarra, I. S. Duff, D. Sorensen, and H. V. Vorst. Numerical Linear Algebra for High Performance Computers (Software, Environments, Tools). SIAM, 1999. [DeC89] A. L. DeCegama. The Technology of Parallel Processing: Parallel Processing Architectures and VLSI Hardware: Volume 1. Prentice-Hall, Englewood Cliffs, NJ, 1989. [DEH89] P. M. Dew, R. A. Earnshaw, and T. R. Heywood. Parallel Processing for Computer Vision and Display. Addison-Wesley, Reading, MA, 1989. [Dem82] J. Deminet. Experiences with multiprocessor algorithms. IEEE Transactions on Computers, C-31(4):278-288, 1982. [DFHM82] D. J. DeWitt, D. B. Friedland, D. K. Hsiao, and M. J. Menon. A taxonomy of parallel sorting algorithms. Technical Report TR-482, Computer Sciences Department, University of Wisconsin, Madison, WI, 1982. [DFRC96] F. Dehne, A. Fabri, and A. Rau-Chaplin. Scalable parallel computational geometry for coarse grained multicomputers. International Journal on Computational Geometry, 6(3):379-400, 1996. [DHvdV93] J. W. Demmel, M. T. Heath, and H. A. van der Vorst. Parallel numerical linear algebra. Acta Numerica, 111-197, 1993. [Dij59] E. W. Dijkstra. A note on two problems in connection with graphs. Numerische Mathematik, 1:269-271, 1959. [DM93] S. Dutt and N. R. Mahapatra. Parallel A* algorithms and their performance on hypercube multiprocessors. In Proceedings of the Seventh International Parallel Processing Symposium, 797-803, 1993. [DM98] L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science and Engineering, 5(1):46-55, January/March 1998. [DNS81] E. Dekel, D. Nassimi, and S. Sahni. Parallel matrix and graph algorithms. SIAM Journal on Computing, 10:657-673, 1981. [Dra96] D. G. Drake. Introduction to Java threads. JavaWorld: IDG's magazine for the Java community, 1(2), April 1996. [DRGNP] F. Darema-Rogers, D. George, V. Norton, and G. Pfister. VM parallel environment. In Proceedings of the IBM Kingston Parallel Processing Symposium. [DS86] W. J. Dally and C. L. Seitz. The torus routing chip. Journal of Distributed Computing, 1(3):187-196, 1986. [DS87] W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, C-36(5):547- 553, 1987. [DSG83] E. W. Dijkstra, W. H. Seijen, and A. J. M. V. Gasteren. Derivation of a termination detection algorithm for a distributed computation. Information Processing Letters, 16(5):217-219, 1983. [DT89] L. Desbat and D. Trystram. Implementing the discrete Fourier transform on a hypercube vector-parallel computer. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications: Volume I, 407- 410, 1989. [DV87] K. A. Doshi and P. J. Varman. Optimal graph algorithms on a fixed-size linear array. IEEE Transactions on Computers, C-36(4):460-470, April 1987. [dV89] E. F. V.de Velde. Multicomputer matrix computations: Theory and practice. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 1303-1308, 1989. [DY81] N. Deo and Y. B. Yoo. Parallel algorithms for the minimum spanning tree problem. In Proceedings of the 1981 International Conference on Parallel Processing, 188-189, 1981. [Eck94] J. Eckstein. Parallel branch-and-bound methods for mixed-integer programming on the cm-5. SIAM Journal on Optimization, 4(4):794-814, 1994. [Eck97] J. Eckstein. Distributed versus centralized storage and control for parallel branch and bound: Mixed integer programming on the cm-5. Computational Optimization and Applications, 7(2):199-220, 1997. [Ede89] A. Edelman. Optimal matrix transposition and bit-reversal on hypercubes: Node address-memory address exchanges. Technical report, Thinking Machines Corporation, Cambridge, MA, 1989. [EDH80] O. I. El-Dessouki and W. H. Huen. Distributed enumeration on network computers. IEEE Transactions on Computers, C-29:818-825, September 1980. [EHHR88] S. C. Eisenstat, M. T. Heath, C. S. Henkel, and C. H. Romine. Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, 9(3):589-600, 1988. [EHMN90] M. Evett, J. Hendler, A. Mahanti, and D. Nau. PRA*: A memory-limited heuristic search procedure for the connection machine. In Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, 145- 149, 1990. [Ekl72] J. O. Eklundh. A fast computer method for matrix transposing. IEEE Transactions on Computers, 21(7):801-803, 1972. [Ert92] W. Ertel. OR-parallel theorem proving with random competition. In A. Voronokov, editor, LPAR '92: Logic Programming and Automated Reasoning, 226-237. Springer-Verlag, New York, NY, 1992. [EZL89] D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus efficiency in parallel systems. IEEE Transactions on Computers, 38(3):408-423, 1989. [Fen81] T. Y. Feng. A survey of interconnection networks. IEEE Computer, 12-27, December 1981. [FF82] R. A. Finkel and J. P. Fishburn. Parallelism in alpha-beta search. Artificial Intelligence, 19:89-106, 1982. [FF86] G. C. Fox and W. Furmanski. Optimal communication algorithms on hypercube. Technical Report CCCP-314, California Institute of Technology, Pasadena, CA, 1986. [FJDS96] L. Fosdick, E. Jessup, G. Domik, and C. Schauble. Introduction to High-Performance Scientific Computing. MIT Press, 1996. [FJL [FK88] C. Ferguson and R. Korf. Distributed tree search and its application to alpha-beta pruning. In Proceedings of the 1988 National Conference on Artificial Intelligence, 1988. [FK89] H. P. Flatt and K. Kennedy. Performance of parallel processors. Parallel Computing, 12:1-20, 1989. [FKO86] E. Felten, S. Karlin, and S. W.Otto. Sorting on a hypercube. Caltech/JPL, 1986. Hm 244. [Fla90] H. P. Flatt. Further applications of the overhead model for parallel systems. Technical Report G320-3540, IBM Corporation, Palo Alto Scientific Center, Palo Alto, CA, 1990. [Flo62] R. W. Floyd. Algorithm 97: Shortest path. Communications of the ACM, 5(6):345, June 1962. [Fly72] M. J. Flynn. Some computer organizations and their effectiveness. IEEE Transactions on Computers, C-21(9):948-960, 1972. [Fly95] M. J. Flynn. Computer Architecture: Pipelined and Parallel Processor Design. Jones and Bartlett, 1995. [FM70] W. D. Frazer and A. C. McKellar. Samplesort: A sampling approach to minimal storage tree sorting. Journal of the ACM, 17(3):496-507, July 1970. [FM87] R. A. Finkel and U. Manber. DIB-a distributed implementation of backtracking. ACM Transactions on Programming Languages and Systems, 9(2):235-256, April 1987. [FM92] R. Frye and J. Myczkowski. Load balancing algorithms on the connection machine and their use in Monte-Carlo methods. In Proceedings of the Unstructured Scientific Computation on Multiprocessors Conference, 1992. [FMM94] R. Feldmann, P. Mysliwietz, and B. Monien. Studying overheads in massively parallel min/max-tree evaluation. In Proc. of the 6th ACM Symposium on Parallel Algorithms and Architectures, 94-103, 1994. [FOH87] G. C. Fox, S. W. Otto, and A. J. G. Hey. Matrix algorithms on a hypercube I: Matrix multiplication. Parallel Computing, 4:17-31, 1987. [Fos95] I. Foster. Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, 1995. [Fou94] T. J. Fountain. Parallel Computing: Principles and Practice. Cambridge University Press, 1994. [FR62] L. R. Ford and R. L. Rivest. Flows in Networks. Princeton University Press, Princeton, NJ, 1962. [Fra93] M. Franklin. The multiscalar architecture. Technical Report CS-TR-1993-1196, University of Wisconsin, 1993. [FTI90] M. Furuichi, K. Taki, and N. Ichiyoshi. A multi-level load balancing scheme for OR-parallel exhaustive search programs on the Multi-PSI. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 50-59, 1990. [FW78] S. Fortune and J. Wyllie. Parallelism in random access machines. In Proceedings of ACM Symposium on Theory of Computing, 114-118, 1978. [Gal95] B. O. Gallmeister. Posix. 4 : Programming for the Real World. O'Reilly & Associates, 1995. [GBD [Gei85] G. A. Geist. Efficient parallel LU factorization with pivoting on a hypercube multiprocessor. Technical Report ORNL-6211, Oak Ridge National Laboratory, Oak Ridge, TN, 1985. [GGK [GGK93] A. Y. Grama, A. Gupta, and V. Kumar. Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Parallel and Distributed Technology, 1(3):12-21, August 1993. [GH85] G. A. Geist and M. T. Heath. Parallel Cholesky factorization on a hypercube multiprocessor. Technical Report ORNL-6190, Oak Ridge National Laboratory, Oak Ridge, TN, 1985. [GH86] G. A. Geist and M. T. Heath. Matrix factorization on a hypercube multiprocessor. In M. T. Heath, editor, Hypercube Multiprocessors 1986, 161-180. SIAM, Philadelphia, PA, 1986. [GH01] S. Goedecker and A. Hoisie. Performance Optimization of Numerically Intensive Codes. SIAM, 2001. [Gib85] A. Gibbons. Algorithmic Graph Theory. Cambridge University Press, Cambridge, 1985. [Gib89] P. B. Gibbons. A more practical PRAM model. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 158-168, 1989. [GK91] A. Gupta and V. Kumar. The scalability of matrix multiplication algorithms on parallel computers. Technical Report TR 91-54, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1991. A short version appears in Proceedings of 1993 International Conference on Parallel Processing, pages III-115-III-119, 1993. [GK93a] A. Gupta and V. Kumar. Performance properties of large scale parallel systems. Journal of Parallel and Distributed Computing, 19:234-244, 1993. Also available as Technical Report TR 92-32, Department of Computer Science, University of Minnesota, Minneapolis, MN. [GK93b] A. Gupta and V. Kumar. The scalability of FFT on parallel computers. IEEE Transactions on Parallel and Distributed Systems, 4(8):922-932, August 1993. A detailed version is available as Technical Report TR 90-53, Department of Computer Science, University of Minnesota, Minneapolis, MN. [GKP92] A. Grama, V. Kumar, and P. M. Pardalos. Parallel processing of discrete optimization problems. In Encyclopaedia of Microcomputers. Marcel Dekker Inc., New York, 1992. [GKR91] A. Y. Grama, V. Kumar, and V. N. Rao. Experimental evaluation of load balancing techniques for the hypercube. In Proceedings of the Parallel Computing '91 Conference, 497-514, 1991. [GKRS96]
A. Grama,
V. Kumar,
S. Ranka, and
V. Singh.
A [GKS92] A. Gupta, V. Kumar, and A. H. Sameh. Performance and scalability of preconditioned conjugate gradient methods on parallel computers. Technical Report TR 92-64, Department of Computer Science, University of Minnesota, Minneapolis, MN, 1992. A short version appears in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, pages 664-674, 1993. [GKT79] L. J. Guibas, H. T. Kung, and C. D. Thompson. Direct VLSI Implementation of Combinatorial Algorithms. In Proceedings of Conference on Very Large Scale Integration, California Institute of Technology, 509-525, 1979. [GL96a] G. H. Golub and C. V. Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 1996. [GL96b]
W. D. Gropp and
E. Lusk. User's Guide for [GLDS96] W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789-828, September 1996. [GLS99] W. Gropp, E. Lusk, and A. Skjellum. Using MPI. MIT Press, 1999. 2nd Edition. [GMB88] J. L. Gustafson, G. R. Montry, and R. E. Benner. Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, 9(4):609-638, 1988. [GO93] G. H. Golub and J. M. Ortega. Scientific Computing: An Introduction with Parallel Computing. Academic Press, 1993. [GPS90] K. A. Gallivan, R. J. Plemmons, and A. H. Sameh. Parallel algorithms for dense linear algebra computations . SIAM Review, 32(1):54-135, March 1990. Also appears in K. A. Gallivan et al. Parallel Algorithms for Matrix Computations. SIAM, Philadelphia, PA, 1990. [GR88] G. A. Geist and C. H. Romine. LU factorization algorithms on distributed-memory multiprocessor architectures. SIAM Journal on Scientific and Statistical Computing, 9(4):639-649, 1988. Also available as Technical Report ORNL/TM-10383, Oak Ridge National Laboratory, Oak Ridge, TN, 1987. [GR90] A. Gibbons and W. Rytter. Efficient Parallel Algorithms. Cambridge University Press, Cambridge, UK, 1990. [Gre91] S. Green. Parallel Processing for Computer Graphics. MIT Press, Cambridge, MA, 1991. [GSNL98] W. Gropp, M. Snir, W. Nitzberg, and E. Lusk. MPI: The Complete Reference. MIT Press, 1998. [GT88] A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of the ACM, 35(4):921-940, October 1988. [Gup87] A. Gupta. Parallelism in Production Systems. Morgan Kaufmann, Los Altos, CA, 1987. [Gus88] J. L. Gustafson. Reevaluating Amdahl's law. Communications of the ACM, 31(5):532-533, 1988. [Gus92] J. L. Gustafson. The consequences of fixed time performance measurement. In Proceedings of the 25th Hawaii International Conference on System Sciences: Volume III, 113-124, 1992. [HB84] K. Hwang and F. A. Briggs. Computer Architecture and Parallel Processing. McGraw-Hill, New York, NY, 1984. [HB88] M. M. Huntbach and F. W. Burton. Alpha-beta search on virtual tree machines. Information Science, 44:3-17, 1988. [HCH95] F.-H. Hsu, M. S. Campbell, and A. J. Hoane. Deep Blue system overview. In Proceedings of the 1995 International Conference on Supercomputing, Barcelona, Spain, 240-244, 1995. [HCS79] D. S. Hirschberg, A. K. Chandra, and D. V. Sarwate. Computing connected components on parallel computers. Communications of the ACM, 22(8):461- 464, August 1979. [HD87] S.-R. Huang and L. S. Davis. A tight upper bound for the speedup of parallel best-first branch-and-bound algorithms. Technical report, Center for Automation Research, University of Maryland, College Park, MD, 1987. [HD89a] S. R. Huang and L. S. Davis. Parallel iterative a* search: An admissible distributed heuristic search algorithm. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, 23-29, 1989. [HD89b] K. Hwang and D. DeGroot. Parallel Processing for Supercomputers and Artificial Intelligence. McGraw-Hill, New York, NY, 1989. [HDM97] J. Hill, S. Donaldson, and A. McEwan. Installation and user guide for the oxford bsp toolset: User guide for the oxford bsp toolset (v1.3) implementation of bsplib. Technical report, Oxford University Computing Laboratory, 1997. [Hea85] M. T. Heath. Parallel Cholesky factorization in message-passing multiprocessor environments. Technical Report ORNL-6150, Oak Ridge National Laboratory, Oak Ridge, TN, 1985. [HG97] S. Hamilton and L. Garber. Deep Blue's hardware-software synergy. IEEE Computer, 30(10):29-35, October 1997. [Hil85] W. D. Hillis. The Connection Machine. MIT Press, Cambridge, MA, 1985. [Hil90] M. D. Hill. What is scalability? Computer Architecture News, 18(4), 1990. [Hip89] P. G. Hipes. Matrix multiplication on the JPL/Caltech Mark IIIfp hypercube. Technical Report C3P 746, Concurrent Computation Program, California Institute of Technology, Pasadena, CA, 1989. [Hir76] D. S. Hirschberg. Parallel algorithms for the transitive closure and connected component problem. In Proceedings of the 8th Annual ACM Symposium on the Theory of Computing, 55-57, 1976. [Hir78] D. S. Hirschberg. Fast parallel sorting algorithms. Communications of the ACM, 21(8):657-666, August 1978. [HJ87] C.-T. Ho and S. L. Johnsson. Spanning balanced trees in Boolean cubes. Technical Report YALEU/DCS/RR-508, Department of Computer Science, Yale University, New Haven, CT, 1987. [HJE91] C.-T. Ho, S. L. Johnsson, and A. Edelman. Matrix multiplication on hypercubes using full bandwidth and constant storage. In Proceedings of the 1991 International Conference on Parallel Processing, 447-451, 1991. [HK96]
S. Hambrusch and
A. Khokhar. C [HLM84] R. E. Hiromoto, O. M. Lubeck, and J. Moore. Experiences with the Denelcor HEP. Parallel Computing, 1(3-4):197-206, 1984. [HLV90] S. H. S. Huang, H. Liu, and V. Vishwanathan. A sub-linear parallel algorithm for some dynamic programming problems. In Proceedings of the 1990 International Conference on Parallel Processing, III-261-III-264, 1990. [HM80] D. K. Hsiao and M. J. Menon. Parallel record-sorting methods for hardware realization. Osu-cisrc-tr-80-7, Computer Science Information Department, Ohio State University, Columbus, OH, 1980. [HMT [HNR90] P. Heidelberger, A. Norton, and J. T. Robinson. Parallel quicksort using fetch-and-add. IEEE Transactions on Computers, C-39(1):133-138, January 1990. [Hoa62] C. A. R. Hoare. Quicksort. Computer Journal, 5:10-15, 1962. [HP89] S. W. Hornick and F. P. Preparata. Deterministic PRAM simulation with constant redundancy. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 103-109, 1989. [HQ91] P. J. Hatcher and M. J. Quinn. Data Parallel Programming. MIT Press, Cambridge, MA, 1991. [HR88] M. T. Heath and C. H. Romine. Parallel solution of triangular systems on distributed-memory multiprocessors. SIAM Journal on Scientific and Statistical Computing, 9(3):558-588, 1988. [HR91] C.-T. Ho and M. T. Raghunath. Efficient communication primitives on circuit-switched hypercubes. In Sixth Distributed Memory Computing Conference Proceedings, 390-397, 1991. [HS78] E. Horowitz and S. Sahni. Fundamentals of Computer Algorithms. Computer Science Press, Rockville, MD, 1978. [HS86] W. D. Hillis and G. L. Steele. Data parallel algorithms. Communications of the ACM, 29(12):1170-1183, 1986. [Hsu90] F.-H. Hsu. Large scale parallelization of alpha-beta search: An algorithmic and architectural study with computer chess. Technical report, Carnegie Mellon University, Pittsburgh, PA, 1990. Ph.D. Thesis. [Hua85] M. A. Huang. Solving some graph problems with optimal or near-optimal speedup on mesh-of-trees networks. In Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, 232-340, 1985. [HX98] K. Hwang and Z. Xu. Scalable Parallel Computing. McGraw-Hill, New York, NY, 1998. [Hyd99] P. Hyde. Java Thread Programming. Sams, 1999. [IPS91] O. H. Ibarra, T. C. Pong, and S. M. Sohn. Parallel recognition and parsing on the hypercube. IEEE Transactions on Computers, 40(6):764-770, June 1991. [IYF79] M. Imai, Y. Yoshida, and T. Fukumura. A parallel searching scheme for multiprocessor systems and its application to combinatorial problems. In Proceedings of the International Joint Conference on Artificial Intelligence, 416-418, 1979. [Jaj92] J. Jaja. An Introduction to Parallel Algorithms. Addison-Wesley, Reading, MA, 1992. [JAM87] V. K. Janakiram, D. P. Agrawal, and R. Mehrotra. Randomized parallel algorithms for Prolog programs and backtracking applications. In Proceedings of the 1987 International Conference on Parallel Processing, 278-281, 1987. [JAM88] V. K. Janakiram, D. P. Agrawal, and R. Mehrotra. A randomized parallel backtracking algorithm. IEEE Transactions on Computers, C-37(12), 1988. [JGD87] L. H. Jamieson, D. B. Gannon, and R. J. Douglass, editors. The Characteristics of Parallel Algorithms. MIT Press, Cambridge, MA, 1987. [JH88] S. L. Johnsson and C.-T. Ho. Matrix transposition on Boolean n-cube configured ensemble architectures. SIAM Journal on Matrix Analysis and Applications, 9(3):419-454, July 1988. [JH89] S. L. Johnsson and C.-T. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9):1249- 1268, September 1989. [JH91] S. L. Johnsson and C.-T. Ho. Optimal all-to-all personalized communication with minimum span on Boolean cubes. In Sixth Distributed Memory Computing Conference Proceedings, 299-304, 1991. [JKFM89] S. L. Johnsson, R. Krawitz, R. Frye, and D. McDonald. A radix-2 FFT on the connection machine. Technical report, Thinking Machines Corporation, Cambridge, MA, 1989. [JNS97] L. Johnson, G. Nemhauser, and M. Savelsbergh. Progress in integer programming: An exposition. Technical report, School of Industrial and Systems Engineering, Georgia Institute of Technology, 1997. Available from http://akula.isye.gatech.edu/mwps/mwps.html. [Joh77] D. B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of the ACM, 24(1):1-13, March 1977. [Joh84] S. L. Johnsson. Combining parallel and sequential sorting on a boolean n-cube. In Proceedings of International Conference on Parallel Processing, 1984. [Joh87] S. L. Johnsson. Communication efficient basic linear algebra computations on hypercube architectures. Journal of Parallel and Distributed Computing, 4(2):133-172, April 1987. [Joh90] S. L. Johnsson. Communication in network architectures. In R. Suaya and G. Birtwistle, editors, VLSI and Parallel Computation, 223-389. Morgan Kaufmann, San Mateo, CA, 1990. [JP93] M. T. Jones and P. E. Plassmann. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing, 14:654-669, 1993. [JS87] J.-F. Jenq and S. Sahni. All pairs shortest paths on a hypercube multiprocessor. In Proceedings of the 1987 International Conference on Parallel Processing, 713-716, 1987. [KA88] R. A. Kamin and G. B. Adams. Fast Fourier transform algorithm design and tradeoffs. Technical Report RIACS TR 88.18, NASA Ames Research Center, Moffet Field, CA, 1988. [KA99a] Y.-K. Kwok and I. Ahmad. Benchmarking and comparison of the task graph scheduling algorithms. Journal of Parallel and Distributed Computing, 59:381-422, 1999. [KA99b] Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, 31(4):406- 471, 1999. [KB57] T. C. Koopmans and M. J. Beckmann. Assignment problems and the location of economic activities. Econometrica, 25:53-76, 1957. [Ken90] Kendall Square Research Corporation. KSR-1 Overview. Waltham, MA. 1990. [KF90] A. H. Karp and H. P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539-543, 1990. [KG94] V. Kumar and A. Gupta. Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 22(3):379-391, 1994. Also available as Technical Report TR 91-18, Department of Computer Science Department, University of Minnesota, Minneapolis, MN. [KGK90] V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors. Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, 1990. [KGR94] V. Kumar, A. Grama, and V. N. Rao. Scalable load balancing techniques for parallel computers. Journal of Parallel and Distributed Computing, 22(1):60- 79, July 1994. [KH67] R. M. Karp and M. H. Held. Finite state processes and dynamic programming. SIAM Journal of Applied Math, 15:693-718, 1967. [KH83] M. Kumar and D. S. Hirschberg. An efficient implementation of Batcher's odd-even merge algorithm and its application in parallel sorting schemes. IEEE Transactions on Computers, C-32, March 1983. [KK79] P. Kermani and L. Kleinrock. Virtual cut-through: A new communication switching technique. Computer Networks, 3(4):267-286, 1979. [KK83] V. Kumar and L. N. Kanal. A general branch-and-bound formulation for understanding and synthesizing and/or tree search procedures. Artificial Intelligence, 21:179-198, 1983. [KK84] V. Kumar and L. N. Kanal. Parallel branch-and-bound formulations for and/or tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6:768-778, 1984. [KK88a] L. N. Kanal and V. Kumar. Search in Artificial Intelligence. Springer-Verlag, New York, NY, 1988. [KK88b] V. Kumar and L. N. Kanal. The CDP: A unifying formulation for heuristic search, dynamic programming, and branch-and-bound. In L. N. Kanal and V. Kumar, editors, Search in Artificial Intelligence, 1-27. Springer-Verlag, New York, NY, 1988. [KK93] G. Karypis and V. Kumar. Efficient Parallel Mappings of a Dynamic Programming Algorithm. In Proceedings of 7th International Parallel Processing Symposium, number 563-568, 1993. [KK94] G. Karypis and V. Kumar. Unstructured tree search on simd parallel computers. Journal of Parallel and Distributed Computing, 22(3):379-391, September 1994. [KK99] G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278-300, 1999. [KKKS94] L. N. Kanal, V. Kumar, H. Kitano, and C. Suttner, editors. Parallel Processing for Artificial Intelligence. North-Holland, Amsterdam, The Netherlands, 1994. [KN91] K. Kimura and I. Nobuyuki. Probabilistic analysis of the efficiency of the dynamic load distribution. In Sixth Distributed Memory Computing Conference Proceedings, 1991. [Knu73] D. E. Knuth. The Art of Computer Programming: Sorting and Searching. Addison-Wesley, Reading, MA, 1973. [Kor81] W. Kornfeld. The use of parallelism to implement a heuristic search. In Proceedings of the International Joint Conference on Artificial Intelligence, 575-580, 1981. [Kow88] J. S. Kowalik. Parallel Computation and Computers for Artificial Intelligence. Kluwer Academic Publishers, Boston, MA, 1988. [KP92] C. Kaklamanis and G. Persiano. Branch-and-bound and backtrack search on mesh-connected arrays of processors. In Proceedings of Fourth Annual Symposium on Parallel Algorithms and Architectures, 118-126, 1992. [KR87a] V. K. P. Kumar and C. S. Raghavendra. Array processor with multiple broadcasting. Journal of Parallel and Distributed Computing, 173-190, 1987. [KR87b] V. Kumar and V. N. Rao. Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, 16(6):501-519, 1987. [KR88] R. M. Karp and V. Ramachandran. A survey of complexity of algorithms for shared-memory machines. Technical Report 408, University of California, Berkeley, 1988. [KR89] V. Kumar and V. N. Rao. Load balancing on the hypercube architecture. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 603-608, 1989. [KRR88] V. Kumar, K. Ramesh, and V. N. Rao. Parallel best-first search of state-space graphs: A summary of results. In Proceedings of the 1988 National Conference on Artificial Intelligence, 122-126, 1988. [KRS88] C. P. Kruskal, L. Rudolph, and M. Snir. A complexity theory of efficient parallel algorithms. Technical Report RC13572, IBM T. J. Watson Research Center, Yorktown Heights, NY, 1988. [Kru56] J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the AMS, volume 7, 48-50, 1956. [KS88] J. Kuehn and B. Smith. The horizon supercomputing system: architecture and software. In Proceedings of Supercomputing Conference, 28-34, 1988. [KS91a] L. V. Kale and V. Saletore. Efficient parallel execution of IDA* on shared and distributed-memory multiprocessors. In Sixth Distributed Memory Computing Conference Proceedings, 1991. [KS91b] V. Kumar and V. Singh. Scalability of parallel algorithms for the all-pairs shortest path problem. Journal of Parallel and Distributed Computing, 13(2):124-138, October 1991. A short version appears in the Proceedings of the International Conference on Parallel Processing, 1990. [KSS95] S. Kleiman, D. Shah, and B. Smaalders. Programming with Threads. SunSoft Press, Mountainview, CA, 1995. [KU86] A. R. Karlin and E. Upfal. Parallel hashing - an efficient implementation of shared memory. In Proceedings of 18th ACM Conference on Theory of Computing, 160-168, 1986. [Kun80] J. T. Kung. The structure of parallel algorithms. In M. Yovits, editor, Advances in Computing, 73-74. Academic Press, San Diego, CA, 1980. [Kun86] H. T. Kung. Memory requirements for balanced computer architectures. In Proceedings of the 1986 IEEE Symposium on Computer Architecture, 49-54, 1986. [KV92] K. Kreeger and N. R. Vempaty. Comparison of meshes vs. hypercubes for data rearrangement. Technical Report UCF-CS-92-28, Department of Computer Science, University of Central Florida, Orlando, FL, 1992. [KZ88] R. M. Karp and Y. Zhang. A randomized parallel branch-and-bound procedure. In Proceedings of the ACM Annual Symposium on Theory of Computing, 290-300, 1988. [Law75] D. H. Lawrie. Access and alignment of data in an array processor. IEEE Transactions on Computers, C-24(1):1145-1155, 1975. [LB95a] B. Lewis and D. J. Berg. Threads Primer: A Guide to Multithreaded Programming. Prentice Hall PTR/Sun Microsystems Press, 1995. [LB95b] B.-H. Lim and R. Bianchini. Limits on the performance benefits of multithreading and prefetching. Research report RC 20238 (89547), IBM T. J. Watson Research Center, Yorktown Heights, NY, October 1995. [LB97] B. Lewis and D. J. Berg. Multithreaded Programming with Pthreads. Prentice Hall PTR/Sun Microsystems Press, 1997. [LB98] T. G. Lewis and D. Berg. Multithreaded Programming with PThreads. Sun Microsystems Press / Prentice Hall, 1998. [LC88] G.-J. Li and T. Coleman. A parallel triangular solver for a hypercube multiprocessor. SIAM Journal on Scientific and Statistical Computing, 9:485-502, 1988. [LC89] G.-J. Li and T. Coleman. A new method for solving triangular systems on distributed memory message passing multiprocessors. SIAM Journal on Scientific and Statistical Computing, 10:382-396, 1989. [LD90] S. Lakshmivarahan and S. K. Dhall. Analysis and Design of Parallel Algorithms: Arithmetic and Matrix Problems. McGraw-Hill, New York, NY, 1990. [LDP89] M. R. Leuze, L. W. Dowdy, and K. H. Park. Multiprogramming a distributed-memory multiprocessor. Concurrency: Practice and Experience, 1(1):19-33, September 1989. [Lea99] D. Lea. Concurrent Programming in Java, Second Edition: Design Principles and Patterns. Addison-Wesley, 1999. [Lei83] F. T. Leighton. Parallel computation using mesh of trees. In Proceedings of International Workshop on Graph-Theoretic Concepts in Computer Science, 1983. [Lei85a] F. T. Leighton. Tight bounds on the complexity of parallel sorting. IEEE Transactions on Computers, C-34(4):344-354, April 1985. [Lei85b] C. E. Leiserson. Fat-trees: Universal networks for hardware efficient supercomputing. In Proceedings of the 1985 International Conference on Parallel Processing, 393-402, 1985. [Lei92] F. T. Leighton. Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann, San Mateo, CA, 1992. [LER92] T. G. Lewis and H. El-Rewini. Introduction to Parallel Computing. Prentice-Hall, Englewood Cliffs, NJ, 1992. [Les93] B. P. Lester. The Art of Parallel Programming. Prentice-Hall, Englewood Cliffs, NJ, 1993. [Lev87] S. P. Levitan. Measuring communications structures in parallel architectures and algorithms. In L. H. Jamieson, D. B. Gannon, and R. J. Douglass, editors, The Characteristics of Parallel Algorithms. MIT Press, Cambridge, MA, 1987. [Lew91] D. A. Lewine. Posix Programmer's Guide: Writing Portable Unix Programs with the Posix. 1 Standard. O'Reilly & Associates, 1991. [LHZ98] H. Lu, C. Hu, and W. Zwaenepoel. OpenMP on networks of workstations. In SC '98, High Performance Networking and Computing Conference, Orlando, Florida, 1998. [Lil92] D. J. Lilja. Architectural Alternatives for Exploiting Parallelism. IEEE Computer Society Press, Los Alamitos, CA, 1992. [Lin83] G. Lindstrom. The key node method: A highly parallel alpha-beta algorithm. Technical Report 83-101, Computer Science Department, University of Utah, Salt Lake City, UT, 1983. [Lin92] Z. Lin. A distributed fair polling scheme applied to or-parallel logic programming. International Journal of Parallel Programming, 20(4), August 1992. [LK72] K. N. Levitt and W. T. Kautz. Cellular arrays for the solution of graph problems. Communications of the ACM, 15(9):789-801, September 1972. [LK85] D. B. Leifker and L. N. Kanal. A hybrid SSS*/alpha-beta algorithm for parallel search of game trees. In Proceedings of the International Joint Conference on Artificial Intelligence, 1044-1046, 1985. [LLG [LM97] E. K. Lee and J. E. Mitchell. Computational experience of an interior-point algorithm in a parallel branch-and-cut framework. In Proceedings for SIAM Conference on Parallel Processing for Scientific Computing, 1997. [LMR88] F. T. Leighton, B. Maggs, and S. K. Rao. Universal packet routing algorithms. In 29th Annual Symposium on Foundations of Computer Science, 256-271, 1988. [Loa92] C. V. Loan. Computational Frameworks for the Fast Fourier Transform. SIAM, Philadelphia, PA, 1992. [LP92] Y. Li and P. M. Pardalos. Parallel algorithms for the quadratic assignment problem. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, 177-189. North-Holland, Amsterdam, The Netherlands, 1992. [LPP88] F. Luccio, A. Pietracaprina, and G. Pucci. A probabilistic simulation of PRAMs on a bounded degree network. Information Processing Letters, 28:141-147, July 1988. [LPP89] F. Luccio, A. Pietracaprina, and G. Pucci. A new scheme for deterministic simulation of PRAMs in VLSI. SIAM Journal of Computing, 1989. [LRZ95] C. Leiserson, K. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Santa Barbara, CA, 1995. [LS84] T. H. Lai and S. Sahni. Anomalies in parallel branch and bound algorithms. Communications of the ACM, 594-602, 1984. [LS85] T. H. Lai and A. Sprague. Performance of parallel branch-and-bound algorithms. IEEE Transactions on Computers, C-34(10), October 1985. [LS86] T. H. Lai and A. Sprague. A note on anomalies in parallel branch-and-bound algorithms with one-to-one bounding functions. Information Processing Letters, 23:119-122, October 1986. [LSS88] J. Lee, E. Shragowitz, and S. Sahni. A hypercube algorithm for the 0/1 knapsack problem. Journal of Parallel and Distributed Computing, (5):438-456, 1988. [Lub86] M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing, 15(4):1036-1053, 1986. [LW66] E. L. Lawler and D. Woods. Branch-and-bound methods: A survey. Operations Research, 14, 1966. [LW84] G.-J. Li and B. W. Wah. Computational efficiency of parallel approximate branch-and-bound algorithms. In Proceedings of the 1984 International Conference on Parallel Processing, 473-480, 1984. [LW85] G.-J. Li and B. W. Wah. Parallel processing of serial dynamic programming problems. In Proceedings of COMPSAC 85, 81-89, 1985. [LW86] G.-J. Li and B. W. Wah. Coping with anomalies in parallel branch-and-bound algorithms. IEEE Transactions on Computers, C-35, June 1986. [LW95] D. Lenoski and W. D. Weber. Scalable Shared-Memory Multiprocessing. Morgan Kaufmann, San Mateo, CA, 1995. [LY86] M. Li and Y. Yesha. New lower bounds for parallel computations. In Proceedings of 18th ACM Conference on Theory of Computing, 177-187, 1986. [MC82] T. A. Marsland and M. S. Campbell. Parallel search of strongly ordered game trees. Computing Surveys, 14:533-551, 1982. [MD92] A. Mahanti and C. Daniels. SIMD parallel heuristic search. Artificial Intelligence, 1992. [MD93] N. R. Mahapatra and S. Dutt. Scalable duplicate pruning strategies for parallel A* graph search. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993. [MdV87] O. A. McBryan and E. F. V. de Velde. Hypercube algorithms and implementations. SIAM Journal on Scientific and Statistical Computing, 8(2):s227-s287, March 1987. [Mes94] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Available at http://www.mpi-forum.org. May 1994. [Mes97] Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface. Available at http://www.mpi-forum.org. July 1997. [MFMV90] B. Monien, R. Feldmann, P. Mysliwietz, and O. Vornberger. Parallel game tree search by dynamic tree decomposition. In V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, 1990. [MG89] C. U. Martel and D. Q. Gusfield. A fast parallel quicksort algorithm. Information Processing Letters, 30:97-102, 1989. [Mil91] D. Miller. Exact distributed algorithms for travelling salesman problem. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991. [MK99] J. Magee and J. Kramer. Concurrency: State Models and Java Programs. John Wiley & Sons, 1999. [MKRS88] R. Miller, V. K. P. Kumar, D. I. Reisis, and Q. F. Stout. Meshes with reconfigurable buses. In Proceedings of MIT Conference on Advanced Research in VLSI, 163-178, 1988. [MM73] A. Martelli and U. Montanari. From dynamic programming to search algorithms with functional costs. In Proceedings of the International Joint Conference on Artifi cial Intelligence, 345-349, 1973. [MM91] P. Messina and A. Murli, editors. Practical Parallel Computing: Status and Prospects. Wiley, Chichester, UK, 1991. [MMR95] B. Mans, T. Mautor, and C. Roucairol. A parallel depth first search branch and bound for the quadratic assignment problem. European Journal of Operational Research, 81(3):617-628, 1995. [Mod88] J. J. Modi. Parallel Algorithms and Matrix Computation. Oxford University Press, Oxford, UK, 1988. [Moh83] J. Mohan. Experience with two parallel programs solving the traveling salesman problem. In Proceedings of the 1983 International Conference on Parallel Processing, 191-193, 1983. [Mol86] C. B. Moler. Matrix computation on distributed-memory multiprocessors. In M. T. Heath, editor, Hypercube Multiprocessors 1986, 181-195. SIAM, Philadelphia, PA, 1986. [Mol87] C. B. Moler. Another look at Amdahl's law. Technical Report TN-02-0587-0288, Intel Scientific Computers, 1987. [Mol93] D. I. Moldovan. Parallel Processing: From Applications to Systems. Morgan Kaufmann, San Mateo, CA, 1993. [MP85] T. A. Marsland and F. Popowich. Parallel game tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7(4):442-452, July 1985. [MP93] D. L. Miller and J. F. Pekny. The role of performance metrics for parallel mathematical programming algorithms. ORSA Journal on Computing, 5(1), 1993. [MR] D. C. Marinescu and J. R. Rice. On high level characterization of parallelism. Technical Report CSD-TR-1011, CAPO Report CER-90-32, Computer Science Department, Purdue University, West Lafayette, IN. Also published in Journal of Parallel and Distributed Computing, 1993. [MRSR92] G. P. McKeown, V. J. Rayward-Smith, and S. A. Rush. Parallel Branch-and-Bound, 111-150. Advanced Topics in Computer Science. Blackwell Scientific Publications, Oxford, UK, 1992. [MS88] Y. W. E. Ma and D. G. Shea. Downward scalability of parallel architectures. In Proceedings of the 1988 International Conference on Supercomputing, 109-120, 1988. [MS90] G. Manzini and M. Somalvico. Probabilistic performance analysis of heuristic search using parallel hash tables. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 1990. [MS96] R. Miller and Q. F. Stout. Parallel Algorithms for Regular Architectures. MIT Press, Cambridge, MA, 1996. [MV84] K. Mehlhorn and U. Vishkin. Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories. Acta Informatica, 21(4):339-374, November 1984. [MV85] B. Monien and O. Vornberger. The ring machine. Technical report, University of Paderborn, Germany, 1985. Also in Computers and Artificial Intelligence, 3(1987). [MV87] B. Monien and O. Vornberger. Parallel processing of combinatorial search trees. In Proceedings of International Workshop on Parallel Algorithms and Architectures, 1987. [MVS86] B. Monien, O. Vornberger, and E. Spekenmeyer. Superlinear speedup for parallel backtracking. Technical Report 30, University of Paderborn, Germany, 1986. [NA91] D. Nussbaum and A. Agarwal. Scalability of parallel machines. Communications of the ACM, 34(3):57-61, 1991. [Nat90] L. Natvig. Investigating the practical value of Cole's O (log n) time crew pram merge sort algorithm. In 5th International Symposium on Computing and Information Sciences, October 1990. [NBF96] B. Nichols, B. Buttlar, and J. P. Farrell. Pthreads Programming. O'Reilly & Associates, Newton, MA 02164, 1996. [nCU90] nCUBE Corporation. nCUBE 6400 Processor Manual. Beaverton, OR, 1990. [ND96] S. J. Norton and M. D. DiPasquale. Thread time: the multithreaded programming guide. Hewlett-Packard professional books. Prentice-Hall, Englewood Cliffs, NJ 07632, 1996. [Ni91] L. M. Ni. A layered classification of parallel computers. In Proceedings of 1991 International Conference for Young Computer Scientists, 28-33, 1991. [Nic90] J. R. Nickolls. The design of the MasPar MP-1: A cost-effective massively parallel computer. In IEEE Digest of Papers-Comcom, 25-28. IEEE Computer Society Press, Los Alamitos, CA, 1990. [NM93] L. M. Ni and McKinley. A survey of wormhole routing techniques in direct connect networks. IEEE Computer, 26(2), February 1993. [NMB83] D. Nath, S. N. Maheshwari, and P. C. P. Bhatt. Efficient VLSI networks for parallel processing based on orthogonal trees . IEEE Transactions on Computers, C-32:21-23, June 1983. [NS79] D. Nassimi and S. Sahni. Bitonic sort on a mesh connected parallel computer. IEEE Transactions on Computers, C-28(1), January 1979. [NS80] D. Nassimi and S. Sahni. Finding connected components and connected ones on a mesh-connected computer. SIAM Journal of Computing, 9(4):744-757, November 1980. [NS87] A. Norton and A. J. Silberger. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C-36(5):581-591, 1987. [NS93] M. Nigam and S. Sahni. Sorting n numbers on n x n reconfigurable meshes with buses. In 7th International Parallel Processing Symposium, 174-181, 1993. [NSF91] Grand Challenges: High Performance Computing and Communications. A Report by the Committee on Physical, Mathematical and Engineering Sciences, NSF/CISE, 1800 G Street NW, Washington, DC, 20550, 1991. [Nug88] S. F. Nugent. The iPSC/2 direct-connect communications technology. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 51-60, 1988. [Nus82] H. J. Nussbaumer. Fast Fourier Transform and Convolution Algorithms. Springer-Verlag, New York, NY, 1982. [NW88] D. M. Nicol and F. H. Willard. Problem size, parallel architecture, and optimal speedup. Journal of Parallel and Distributed Computing, 5:404-420, 1988. [GOV99] Funding a Revolution: Government Support for Computing Research. Committee on Innovations in Computing and Communications. National Academy Press, 1999. [OR88] J. M. Ortega and C. H. Romine. The ijk forms of factorization methods II: Parallel systems. Parallel Computing, 7:149-162, 1988. [Ort88] J. M. Ortega. Introduction to Parallel and Vector Solution of Linear Systems. Plenum Press, New York, NY, 1988. [OS85] D. P. O'Leary and G. W. Stewart. Data-flow algorithms for parallel matrix computations. Communications of the ACM, 28:840-853, 1985. [OS86] D. P. O'Leary and G. W. Stewart. Assignment and scheduling in parallel matrix factorization. Linear Algebra and its Applications, 77:275-299, 1986. [Pac98] P. Pacheco. Parallel Programming with MPI. Morgan Kaufmann, 1998. [PBG [PC89] P. M. Pardalos and J. Crouse. A parallel algorithm for the quadratic assignment problem. In Supercomputing '89 Proceedings, 351-360. ACM Press, New York, NY, 1989. [PD89] K. H. Park and L. W. Dowdy. Dynamic partitioning of multiprocessor systems. International Journal of Parallel Processing, 18(2):91-120, 1989. [Pea84] J. Pearl. Heuristics-Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading, MA, 1984. [Per87] R. Perrott. Parallel Programming. Addison-Wesley, Reading, MA, 1987. [Pfi98] G. F. Pfister. In Search of Clusters. Prentice Hall, Englewood Cliffs, NJ, 1998. 2nd Edition. [PFK90] C. Powley, C. Ferguson, and R. Korf. Parallel heuristic search: Two approaches. In V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, 1990. [PG98] T. Q. Pham and P. K. Garg. Multithreaded Programming with Win32. Prentice Hall, 1998. [PH90] D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA, 1990. [PH96] D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach, 2nd edition. Morgan Kaufmann, San Mateo, CA, 1996. [PK89] R. C. Paige and C. P. Kruskal. Parallel algorithms for shortest path problems. In Proceedings of 1989 International Conference on Parallel Processing, 14- 19, 1989. [PK95] I. Pramanick and J. G. Kuhl. An inherently parallel method for heuristic problem-solving: Part I - general framework. IEEE Transactions on Parallel and Distributed Systems, 6(10), October 1995. [PKF92] C. Powley, R. Korf, and C. Ferguson. IDA* on the connection machine. Artificial Intelligence, 1992. [Pla89] C. C. Plaxton. Load balancing, selection and sorting on the hypercube. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 64-73, 1989. [PLRR94] P. M. Pardalos, Y. Li, K. Ramakrishna, and M. Resende. Lower bounds for the quadratic assignment problem. Annals of Operations Research, 50:387-411, 1994. Special Volume on Applications of Combinatorial Optimization. [PR85] V. Pan and J. H. Reif. Efficient parallel solution of linear systems. In 17th Annual ACM Symposium on Theory of Computing, 143-152, 1985. [PR89] P. M. Pardalos and G. P. Rodgers. Parallel branch-and-bound algorithms for unconstrainted quadratic zero-one programming. In R. Sharda et al., editors, Impacts of Recent Computer Advances on Operations Research, 131-143. North-Holland, Amsterdam, The Netherlands, 1989. [PR90] P. M. Pardalos and G. P. Rodgers. Parallel branch-and-bound algorithms for quadratic zero-one programming on a hypercube architecture. Annals of Operations Research, 22:271-292, 1990. [PR91] M. Padberg and G. Rinaldi. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, 33:60- 100, 1991. [Pri57] R. C. Prim. Shortest connection network and some generalizations. Bell Systems Technical Journal, 36:1389-1401, 1957. [PRV88] G. Plateau, C. Roucairol, and I. Valabregue. Algorithm PR2 for the parallel size reduction of the 0/1 multiknapsack problem. In INRIA Rapports de Recherche, number 811, 1988. [PS82] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1982. [PV80] F. P. Preparata and J. Vuillemin. Area-time optimal VLSI networks for matrix multiplication. In Proceedings of the 14th Princeton Conference on Information Science and Systems, 300-309, 1980. [PY88] C. H. Papadimitriou and M. Yannakakis. Towards an architecture independent analysis of parallel algorithms. In Proceedings of 20th ACM Symposium on Theory of Computing, 510-513, 1988. [QD86] M. J. Quinn and N. Deo. An upper bound for the speedup of parallel branch-and-bound algorithms. BIT, 26(1), March 1986. [Qui88] M. J. Quinn. Parallel sorting algorithms for tightly coupled multiprocessors. Parallel Computing, 6:349-357, 1988. [Qui89] M. J. Quinn. Analysis and implementation of branch-and-bound algorithms on a hypercube multicomputer. IEEE Transactions on Computers, 1989. [Qui94] M. J. Quinn. Parallel Computing: Theory and Practice. McGraw-Hill, New York, NY, 1994. [Ram97] V. Ramachandran. Qsm: A general purpose shared-memory model for parallel computation. In Foundations of Software Technology and Theoretical Computer Science, 1-5, 1997. [Ran89] A. G. Ranade. Fluent Parallel Computation. Ph.D. Thesis, Department of Computer Science, Yale University, New Haven, CT, 1989. [Ran91] A. G. Ranade. Optimal speedup for backtrack search on a butterfly network. In Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures, 1991. [Rao90] V. N. Rao. Parallel Processing of Heuristic Search. Ph.D. Thesis, University of Texas, Austin, TX, 1990. [Ras78] L. Raskin. Performance Evaluation of Multiple Processor Systems. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, 1978. [RB76] C. M. Rader and N. M. Brenner. A new principle for Fast fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, 24:264-265, 1976. [RDK89] C. Renolet, M. Diamond, and J. Kimbel. Analytical and heuristic modeling of distributed algorithms. Technical Report E3646, FMC Corporation, Advanced Systems Center, Minneapolis, MN, 1989. [Rei81] R. Reischuk. Probabilistic algorithms for sorting and selection. SIAM Journal of Computing, 396-409, 1981. [RF89] D. A. Reed and R. M. Fujimoto. Multicomputer Networks: Message-Based Parallel Processing. MIT Press, Cambridge, MA, 1989. [RICN88] K. Rokusawa, N. Ichiyoshi, T. Chikayama, and H. Nakashima. An efficient termination detection and abortion algorithm for distributed processing systems. In Proceedings of 1988 International Conference on Parallel Processing: Vol. I, 18-22, 1988. [RK87] V. N. Rao and V. Kumar. Parallel depth-first search, part I: Implementation. International Journal of Parallel Programming, 16(6):479-499, 1987. [RK88a] V. N. Rao and V. Kumar. Concurrent access of priority queues. IEEE Transactions on Computers, C-37 (12), 1988. [RK88b] V. N. Rao and V. Kumar. Superlinear speedup in state-space search. In Proceedings of the 1988 Foundation of Software Technology and Theoretical Computer Science, number 338, 161-174. Springer-Verlag Series Lecture Notes in Computer Science, 1988. [RK93] V. N. Rao and V. Kumar. On the efficicency of parallel backtracking. IEEE Transactions on Parallel and Distributed Systems, 4(4):427-437, April 1993. available as a technical report TR 90-55, Computer Science Department, University of Minnesota. [RKR87] V. N. Rao, V. Kumar, and K. Ramesh. A parallel implementation of iterative-deepening-A*. In Proceedings of the National Conference on Artificial Intelligence (AAAI-87), 878-882, 1987. [RND77] E. M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, 1977. [RO88] C. H. Romine and J. M. Ortega. Parallel solution of triangular systems of equations. Parallel Computing, 6:109-114, 1988. [Rob75] M. O. Robin. Probabilistic algorithms. In J. Traub, editor, Algorithms and Complexity: New Directions and Recent Results, 21-39. Academic Press, San Diego, CA, 1975. [Rob90] Y. Robert. The Impact of Vector and Parallel Architectures on Gaussian Elimination. John Wiley and Sons, New York, NY, 1990. [Rom87] C. H. Romine. The parallel solution of triangular systems on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 552-559. SIAM, Philadelphia, PA, 1987. [Rou87] C. Roucairol. A parallel branch-and-bound algorithm for the quadratic assignment problem. Discrete Applied Mathematics, 18:211-225, 1987. [Rou91] C. Roucairol. Parallel branch-and-bound on shared-memory multiprocessors. In Proceedings of the Workshop On Parallel Computing of Discrete Optimization Problems, 1991. [RRRR96] K. A. Robbins, S. Robbins, K. R. Robbins, and S. Robbins. Practical UNIX Programming: A Guide to Concurrency, Communication, and Multithreading. Prentice Hall, 1996. [RS90a] A. P. R. Alverson, D. Callahan, D. Cummings, B. Koblenz and B. Smith. The tera computer system. In International Conference on Supercomputing, 1-6, 1990. [RS90b] S. Ranka and S. Sahni. Hypercube Algorithms for Image Processing and Pattern Recognition. Springer-Verlag, New York, NY, 1990. [RV87] J. H. Reif and L. G. Valiant. A logarithmic time sort for linear size networks. Journal of the ACM, 34(1):60-76, January 1987. [Ryt88] W. Rytter. Efficient parallel computations for dynamic programming. Theoretical Computer Science, 59:297-307, 1988. [RZ89] M. Reeve and S. E. Zenith, editors. Parallel Processing and Artificial Intelligence. Wiley, Chichester, UK, 1989. [Saa86] Y. Saad. Communication complexity of the Gaussian elimination algorithm on multiprocessors. Linear Algebra and its Applications, 77:315-340, 1986. [SB77] H. Sullivan and T. R. Bashkow. A large scale, homogeneous, fully distributed parallel machine. In Proceedings of Fourth Symposium on Computer Architecture, 105-124, March 1977. [SBCV90] R. H. Saavedra-Barrera, D. E. Culler, and T. Von Eiken. Analysis of multithreaded architectures for parallel computing. Report UCB/CSD 90/569, University of California, Berkeley, Computer Science Division, Berkeley, CA, April 1990. [Sch80] J. T. Schwartz. Ultracomputers. ACM Transactions on Programming Languages and Systems, 2:484-521, October 1980. [Sed78] R. Sedgewick. Implementing quicksort programs. Communications of the ACM, 21(10):847-857, 1978. [Sei85] C. L. Seitz. The cosmic cube. Communications of the ACM, 28(1):22-33, 1985. [Sei89] S. R. Seidel. Circuit-switched vs. store-and-forward solutions to symmetric communication problems. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 253-255, 1989. [Sei92] C. L. Seitz. Mosaic C: An experimental fine-grain multicomputer. Technical report, California Institute of Technology, Pasadena, CA, 1992. [SG88] S. R. Seidel and W. L. George. Binsorting on hypercube with d-port communication. In Proceedings of the Third Conference on Hypercube Concurrent Computers, 1455-1461, January 1988. [SG91] X.-H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17:1093-1109, December 1991. Also available as Technical Report IS-5053, UC-32, Ames Laboratory, Iowa State University, Ames, IA. [Sha85] J. A. Sharp. Data-Flow Computing. Ellis Horwood, Chichester, UK, 1985. [She59] D. L. Shell. A high-speed sorting procedure. Communications of the ACM, 2(7):30-32, July 1959. [SHG93] J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7):42-50, 1993. [Sie77] H. J. Siegel. The universality of various types of SIMD machine interconnection networks. In Proceedings of the 4th Annual Symposium on Computer Architecture , 23-25, 1977. [Sie85] H. J. Siegel. Interconnection Networks for Large-Scale Parallel Processing. D. C. Heath, Lexington, MA, 1985. [SJ81] C. Savage and J. Jaja. Fast, efficient parallel algorithms for some graph problems. SIAM Journal of Computing, 10(4):682-690, November 1981. [SK89] W. Shu and L. V. Kale. A dynamic scheduling strategy for the chare-kernel system. In Proceedings of Supercomputing Conference, 389-398, 1989. [SK90] V. Saletore and L. V. Kale. Consistent linear speedup to a first solution in parallel state-space search. In Proceedings of the 1990 National Conference on Artificial Intelligence, 227-233, August 1990. [SKAT91a] V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Efficient algorithms for parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2):95-131, 1991. [SKAT91b] V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Scalability of parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2), 1991. [SM86] S. J. Stolfo and D. P. Miranker. The DADO production system machine. Journal of Parallel and Distributed Computing, 3:269-296, June 1986. [Smi84] D. R. Smith. Random trees and the analysis of branch and bound procedures. Journal of the ACM, 31(1), 1984. [SN90] X.-H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing '90 Proceedings, 324-333, 1990. [SN93] X.-H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19:27-37, September 1993. [Sni82] M. Snir. On parallel search. In Proceedings of Principles of Distributed Computing, 242-253, 1982. [Sni85] M. Snir. On parallel searching. SIAM Journal of Computing, 14(3):688-708, August 1985. [Sny86] L. Snyder. Type architectures, shared-memory and the corollary of modest potential. Annual Review of Computer Science, 1:289-317, 1986. [SOHL [Sol77] M. Sollin. An algorithm attributed to Sollin. In S. Goodman and S. Hedetniemi, editors, Introduction to The Design and Analysis of Algorithms. McGraw-Hill, Cambridge, MA, 1977. [SR91] X.-H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. Technical Report IS-5057, Ames Laboratory, Iowa State University, Ames, IA, 1991. Also published in IEEE Transactions on Parallel and Distributed Systems. [SS88] Y. Saad and M. H. Schultz. Topological properties of hypercubes. IEEE Transactions on Computers, 37:867-872, 1988. [SS89a] Y. Saad and M. H. Schultz. Data communication in hypercubes. Journal of Parallel and Distributed Computing, 6:115-135, 1989. Also available as Technical Report YALEU/DCS/RR-428 from the Department of Computer Science, Yale University, New Haven, CT. [SS89b] Y. Saad and M. H. Schultz. Data communication in parallel architectures. Parallel Computing, 11:131-150, 1989. [SS90] H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14:361-372, 1990. [ST95] B. M. S. Tschvke, R. Lling. Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network. In Proceedings of the 9th International Parallel Processing Symposium, 182- 189, Santa Barbara, CA, April 1995. [Sto71] H. S. Stone. Parallel processing with the perfect shuffle. IEEE Transactions on Computers, C-20(2):153-161, 1971. [Sto93] H. S. Stone. High-Performance Computer Architecture: Third Edition. Addison-Wesley, Reading, MA, 1993. [Sun95] SunSoft. Solaris multithreaded programming guide. SunSoft Press, Mountainview, CA, 1995. [Sup91] Supercomputer Systems Division, Intel Corporation. Paragon XP/S Product Overview. Beaverton, OR, 1991. [SV81] Y. Shiloach and U. Vishkin. Finding the maximum, merging and sorting in a parallel computation model. Journal of Algorithms, 88-102, 1981. [SV82]
Y. Shiloach and
U. Vishkin. An O (n [SW87] Q. F. Stout and B. A. Wagar. Passing messages in link-bound hypercubes. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 251-257. SIAM, Philadelphia, PA, 1987. [Swa87] P. N. Swarztrauber. Multiprocessor FFTs. Parallel Computing, 5:197-210, 1987. [SZ96] X.-H. Sun and J. P. Zhu. Performance considerations of shared virtual memory machines. IEEE Trans. on Parallel and Distributed Systems, 6(11):1185- 1194, 1996. [Tab90] D. Tabak. Multiprocessors. Prentice-Hall, Englewood Cliffs, NJ, 1990. [Tab91] D. Tabak. Advanced Multiprocessors. McGraw-Hill, New York, NY, 1991. [Tak87] R. Take. An optimal routing method of all-to-all communication on hypercube networks. In 35th Information Processing Society of Japan, 1987. [TEL95] D. M. Tullsen, S. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995. [Ten90] S. Teng. Adaptive parallel algorithms for integral knapsack problems. Journal of Parallel and Distributed Computing, 8:400-406, 1990. [Thi90] Thinking Machines Corporation. The CM-2 Technical Summary. Cambridge, MA. 1990. [Thi91] Thinking Machines Corporation. The CM-5 Technical Summary. Cambridge, MA. 1991. [Tho83] C. D. Thompson. Fourier transforms in VLSI. IBM Journal of Research and Development, C-32(11):1047-1057, 1983. [Thr99] J. Throop. Standards: OpenMP: Shared-memory parallelism from the ashes. Computer, 32(5):108-109, May 1999. [Tic88] W. F. Tichy. Parallel matrix multiplication on the connection machine. Technical Report RIACS TR 88.41, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffet Field, CA, 1988. [TK77] C. D. Thompson and H. T. Kung. Sorting on a mesh-connected parallel computer. Communications of the ACM, 21:263-271, 1977. [TL90] Z. Tang and G.-J. Li. Optimal granularity of grid iteration problems. In Proceedings of the 1990 International Conference on Parallel Processing, I111-I118, 1990. [Tul96] D. M. Tullsen. Simultaneous multithreading. Ph.D. Thesis, University of Washington, Seattle, WA, 1996. [TV85] R. E. Tarjan and U. Vishkin. An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, 14(4):862-874, November 1985. [TY96] J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 35-46, 1996. [Upf84] E. Upfal. A probabilistic relation between desirable and feasible models of parallel computation. In Proceedings of the 16th ACM Conference on Theory of Computing, 258-265, 1984. [UW84] E. Upfal and A. Widgerson. How to share memory in a distributed system. In Proceedings of the 25th Annual Symposium on the Foundation of Computer Science, 171-180, 1984. [Val75] L. G. Valiant. Parallelism in comparison problems. SIAM Journal of Computing, 4(3):348-355, September 1975. [Val82] L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11:350-361, 1982. [Val90a] L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8), 1990. [Val90b] L. G. Valiant. General purpose parallel architectures. Handbook of Theoretical Computer Science, 1990. [Vav89] S. A. Vavasis. Gaussian elimination with pivoting is P-complete. SIAM Journal on Discrete Mathematics, 2:413-423, 1989. [VB81] L. G. Valiant and G. J. Brebner. Universal schemes for parallel communication. In Proceedings of the 13th ACM Symposium on Theory of Computation, 263-277, 1981. [VC89] F. A. Van-Catledge. Towards a general model for evaluating the relative performance of computer systems. International Journal of Supercomputer Applications, 3(2):100-108, 1989. [Vor86] O. Vornberger. Implementing branch-and-bound in a ring of processors. Technical Report 29, University of Paderborn, Germany, 1986. [Vor87a] O. Vornberger. The personal supercomputer: A network of transputers. In Proceedings of the 1987 International Conference on Supercomputing, 1987. [Vor87b] O. Vornberger. Load balancing in a network of transputers. In Proceedings of the Second International Workshop on Distributed Parallel Algorithms, 1987. [VS86] J. S. Vitter and R. A. Simmons. New classes for parallel complexity: A study of unification and other complete problems for P. IEEE Transactions on Computers, May 1986. [VSBR83] L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff. Fast parallel computation of polynomials using few processors. SIAM Journal of Computing, 12(4):641-644, 1983. [WA98] B. Wilkinson and C. M. Allen. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall, 1998. [WA99] B. Wilkinson and M. Allen. Parallel Programming. Prentice-Hall, NJ, 1999. [Wag87] B. A. Wagar. Hyperquicksort: A fast sorting algorithm for hypercubes. In Proceedings of the Second Conference on Hypercube Multiprocessors, 292- 299, 1987. [Wal91] Y. Wallach. Parallel Processing and Ada. Prentice-Hall, Englewood Cliffs, NJ, 1991. [WB72] W. A. Wulf and C. G. Bell. C.mmp-a multimicroprocessor. In Proceedings of AFIPS Conference, 765-777, 1972. [Wei97] B. Weissman. Active threads manual. Technical Report TR-97-037, International Computer Science Institute, Berkeley, CA 94704, 1997. [WF80] C. L. Wu and T. Y. Feng. On a class of multistage interconnection networks. IEEE Transactions on Computers, 669-702, August 1980. [WF84] C. L. Wu and T. Y. Feng. Interconnection Networks for Parallel and Distributed Processing. IEEE Computer Society Press, Washington, DC, 1984. [WI89] K. Wada and N. Ichiyoshi. A distributed shortest path algorithm and its mapping on the Multi-PSI. In Proceedings of International Conference of Parallel Processing, 1989. [Wil86] H. S. Wilf. Algorithms and Complexity. Prentice-Hall, NJ, 1986. [Wil95] G. V. Wilson. Practical Parallel Programming. MIT Press, Cambridge, MA, 1995. [Wil00] A. Williams. Windows 2000 Systems Programming Black Book. The Coriolis Group, 2000. [Win77] S. Winograd. A new method for computing DFT. In IEEE International Conference on Acoustics, Speech and Signal Processing, 366-368, 1977. [WL88] B. W. Wah and G.-J. Li. Systolic processing for dynamic programming problems. Circuits, Systems, and Signal Processing, 7(2):119-149, 1988. [WLY84] B. W. Wah, G.-J. Li, and C. F. Yu. The status of MANIP-a multicomputer architecture for solving combinatorial extremum-search problems. In Proceedings of 11th Annual International Symposium on Computer Architecture, 56-63, 1984. [WM84] B. W. Wah and Y. W. E. Ma. MANIP-a multicomputer architecture for solving combinatorial extremum-search problems. IEEE Transactions on Computers, C-33, May 1984. [Woo86] J. V. Woods, editor. Fifth Generation Computer Architectures. North-Holland, Amsterdam, The Netherlands, 1986. [Wor88] P. H. Worley. Information Requirements and the Implications for Parallel Computation. Ph.D. Thesis, Stanford University, Department of Computer Science, Palo Alto, CA, 1988. [Wor90] P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838-858, 1990. [Wor91] P. H. Worley. Limits on parallelism in the numerical solution of linear PDEs. SIAM Journal on Scientific and Statistical Computing, 12:1-35, January 1991. [WS88] Y. Won and S. Sahni. A balanced bin sort for hypercube multiprocessors. Journal of Supercomputing, 2:435-448, 1988. [WS89] J. Woo and S. Sahni. Hypercube computing: Connected components. Journal of Supercomputing, 3:209-234, 1989. [WS91] J. Woo and S. Sahni. Computing biconnected components on a hypercube. Journal of Supercomputing, June 1991. Also available as Technical Report TR 89-7 from the Department of Computer Science, University of Minnesota, Minneapolis, MN. [WY85] B. W. Wah and C. F. Yu. Stochastic modeling of branch-and-bound algorithms with best-first search. IEEE Transactions on Software Engineering, SE-11, September 1985. [Zho89] X. Zhou. Bridging the gap between Amdahl's law and Sandia laboratory's result. Communications of the ACM, 32(8):1014-5, 1989. [Zom96] A. Zomaya, editor. Parallel and Distributed Computing Handbook. McGraw-Hill, 1996. [ZRV89] J. R. Zorbas, D. J. Reble, and R. E. VanKooten. Measuring the scalability of parallel computer systems. In Supercomputing '89 Proceedings, 832-841, 1989. |