## 6.4 Topologies and EmbeddingMPI views the processes as being arranged in a one-dimensional topology and uses a linear ordering to number the processes. However, in many parallel programs, processes are naturally arranged in higher-dimensional topologies (e.g., two- or three-dimensional). In such programs, both the computation and the set of interacting processes are naturally identified by their coordinates in that topology. For example, in a parallel program in which the processes are arranged in a two-dimensional topology, process (i, j) may need to send message to (or receive message from) process (k, l). To implement these programs in MPI, we need to map each MPI process to a process in that higher-dimensional topology. Many such mappings are possible. Figure 6.5 illustrates some possible mappings of eight MPI processes onto a 4 x 4 two-dimensional topology. For example, for the mapping shown in Figure 6.5(a), an MPI process with rank rank corresponds to process (row, col) in the grid such that row = rank/4 and col = rank%4 (where '%' is C's module operator). As an illustration, the process with rank 7 is mapped to process (1, 3) in the grid. ## Figure 6.5. Different ways to map a set of processes to a two-dimensional grid. (a) and (b) show a row- and column-wise mapping of these processes, (c) shows a mapping that follows a space-filling curve (dotted line), and (d) shows a mapping in which neighboring processes are directly connected in a hypercube.In general, the goodness of a mapping is determined by the pattern of interaction among the processes in the higher-dimensional topology, the connectivity of physical processors, and the mapping of MPI processes to physical processors. For example, consider a program that uses a two-dimensional topology and each process needs to communicate with its neighboring processes along the x and y directions of this topology. Now, if the processors of the underlying parallel system are connected using a hypercube interconnection network, then the mapping shown in Figure 6.5(d) is better, since neighboring processes in the grid are also neighboring processors in the hypercube topology. However, the mechanism used by MPI to assign ranks to the processes in a communication domain does not use any information about the interconnection network, making it impossible to perform topology embeddings in an intelligent manner. Furthermore, even if we had that information, we will need to specify different mappings for different interconnection networks, diminishing the architecture independent advantages of MPI. A better approach is to let the library itself compute the most appropriate embedding of a given topology to the processors of the underlying parallel computer. This is exactly the approach facilitated by MPI. MPI provides a set of routines that allows the programmer to arrange the processes in different topologies without having to explicitly specify how these processes are mapped onto the processors. It is up to the MPI library to find the most appropriate mapping that reduces the cost of sending and receiving messages. ## 6.4.1 Creating and Using Cartesian TopologiesMPI provides routines that allow the specification of virtual process topologies of arbitrary connectivity in terms of a graph. Each node in the graph corresponds to a process and two nodes are connected if they communicate with each other. Graphs of processes can be used to specify any desired topology. However, most commonly used topologies in message-passing programs are one-, two-, or higher-dimensional grids, that are also referred to as Cartesian topologies. For this reason, MPI provides a set of specialized routines for specifying and manipulating this type of multi-dimensional grid topologies. MPI's function for describing Cartesian topologies is called int MPI_Cart_create(MPI_Comm comm_old, int ndims, int *dims, int *periods, int reorder, MPI_Comm *comm_cart) This function takes the group of processes that belong to the communicator
Process Naming
When a Cartesian topology is used, each process is better identified by its coordinates in this topology. However, all MPI functions that we described for sending and receiving messages require that the source and the destination of each message be specified using the rank of the process. For this reason, MPI provides two functions, int MPI_Cart_rank(MPI_Comm comm_cart, int *coords, int *rank) int MPI_Cart_coord(MPI_Comm comm_cart, int rank, int maxdims, int *coords) The Frequently, the communication performed among processes in a Cartesian topology is that of shifting data along a dimension of the topology. MPI provides the function int MPI_Cart_shift(MPI_Comm comm_cart, int dir, int s_step, int *rank_source, int *rank_dest) The direction of the shift is specified in the ## 6.4.2 Example: Cannon's Matrix-Matrix MultiplicationTo illustrate how the various topology functions are used we will implement Cannon's algorithm for multiplying two matrices A and B, described in Section 8.2.2. Cannon's algorithm views the processes as being arranged in a virtual two-dimensional square array. It uses this array to distribute the matrices A, B, and the result matrix C in a block fashion. That is, if n x n is the size of each matrix and p is the total number of process, then each matrix is divided into square blocks of size (assuming that p is a perfect square). Now, process P Program 6.2 shows the MPI function that implements Cannon's algorithm. The dimension of the matrices is supplied in the parameter ## Program 6.2 Cannon's Matrix-Matrix Multiplication with MPI's Topologies1 MatrixMatrixMultiply(int n, double *a, double *b, double *c, 2 MPI_Comm comm) 3 { 4 int i; 5 int nlocal; 6 int npes, dims[2], periods[2]; 7 int myrank, my2drank, mycoords[2]; 8 int uprank, downrank, leftrank, rightrank, coords[2]; 9 int shiftsource, shiftdest; 10 MPI_Status status; 11 MPI_Comm comm_2d; 12 13 /* Get the communicator related information */ 14 MPI_Comm_size(comm, &npes); 15 MPI_Comm_rank(comm, &myrank); 16 17 /* Set up the Cartesian topology */ 18 dims[0] = dims[1] = sqrt(npes); 19 20 /* Set the periods for wraparound connections */ 21 periods[0] = periods[1] = 1; 22 23 /* Create the Cartesian topology, with rank reordering */ 24 MPI_Cart_create(comm, 2, dims, periods, 1, &comm_2d); 25 26 /* Get the rank and coordinates with respect to the new topology */ 27 MPI_Comm_rank(comm_2d, &my2drank); 28 MPI_Cart_coords(comm_2d, my2drank, 2, mycoords); 29 30 /* Compute ranks of the up and left shifts */ 31 MPI_Cart_shift(comm_2d, 0, -1, &rightrank, &leftrank); 32 MPI_Cart_shift(comm_2d, 1, -1, &downrank, &uprank); 33 34 /* Determine the dimension of the local matrix block */ 35 nlocal = n/dims[0]; 36 37 /* Perform the initial matrix alignment. First for A and then for B */ 38 MPI_Cart_shift(comm_2d, 0, -mycoords[0], &shiftsource, &shiftdest); 39 MPI_Sendrecv_replace(a, nlocal*nlocal, MPI_DOUBLE, shiftdest, 40 1, shiftsource, 1, comm_2d, &status); 41 42 MPI_Cart_shift(comm_2d, 1, -mycoords[1], &shiftsource, &shiftdest); 43 MPI_Sendrecv_replace(b, nlocal*nlocal, MPI_DOUBLE, 44 shiftdest, 1, shiftsource, 1, comm_2d, &status); 45 46 /* Get into the main computation loop */ 47 for (i=0; i<dims[0]; i++) { 48 MatrixMultiply(nlocal, a, b, c); /*c=c+a*b*/ 49 50 /* Shift matrix a left by one */ 51 MPI_Sendrecv_replace(a, nlocal*nlocal, MPI_DOUBLE, 52 leftrank, 1, rightrank, 1, comm_2d, &status); 53 54 /* Shift matrix b up by one */ 55 MPI_Sendrecv_replace(b, nlocal*nlocal, MPI_DOUBLE, 56 uprank, 1, downrank, 1, comm_2d, &status); 57 } 58 59 /* Restore the original distribution of a and b */ 60 MPI_Cart_shift(comm_2d, 0, +mycoords[0], &shiftsource, &shiftdest); 61 MPI_Sendrecv_replace(a, nlocal*nlocal, MPI_DOUBLE, 62 shiftdest, 1, shiftsource, 1, comm_2d, &status); 63 64 MPI_Cart_shift(comm_2d, 1, +mycoords[1], &shiftsource, &shiftdest); 65 MPI_Sendrecv_replace(b, nlocal*nlocal, MPI_DOUBLE, 66 shiftdest, 1, shiftsource, 1, comm_2d, &status); 67 68 MPI_Comm_free(&comm_2d); /* Free up communicator */ 69 } 70 71 /* This function performs a serial matrix-matrix multiplication c = a*b */ 72 MatrixMultiply(int n, double *a, double *b, double *c) 73 { 74 int i, j, k; 75 76 for (i=0; i<n; i++) 77 for (j=0; j<n; j++) 78 for (k=0; k<n; k++) 79 c[i*n+j] += a[i*n+k]*b[k*n+j]; 80 } |