     ### Chapter 2 - Designing Threaded Programs

Bradford Nichols, Dick Buttlar and Jacqueline Proulx Farrell Example: A Matrix Multiplication Program In this section we'll look at a program very different from the ATM client/server example: one that exemplifies how you can break down a program into tasks. Whereas we used the boss/worker model to design our ATM server, we'll use the peer model for this one. A large class of programs are computationally intensive and work on large sets of data: image processing, statistical analysis, and finite element modeling, to name a few. In some cases these programs may require I/O to databases or to multiple devices. Our example uses a simple matrix-multiply program to look at the peer design model, which is commonly used in these programs. Matrix multiplication takes two two-dimensional input arrays of data and computes a third. If you remember your matrix algebra, the multiplication goes like this:   A program that performs a matrix multiplication must compute the value of every element in the result array. If the program is nonthreaded, the total time for the program is the time it takes to compute an individual element multiplied by the number of elements. For other programs in this class, the operation on each element of input may not be specifically multiplicationperhaps encryption, translation, or comparison. Also, the input data may not always be a well-formed array. However, all will have the characteristic of repeating some basic operation over and over again on subsets of their data. We can improve the performance of these programs using threads in two ways: by providing overlapping I/O and by parallel processing. If processing the input elements required I/O, threading would allow the program to continue while one thread blocked waiting for I/O completion (see Figure 2-8). If the input and output arrays of our matrix-multiply program were stored on disk (or even were the input from individual remote sensors), threads could block individually on the I/O operations they needed to complete while other threads continued.   Figure 2-8: Improving performance with overlapping I/O  When our matrix-multiply program is run on a multiprocessing system, as shown in Figure 2-9, the threads assigned to different elements of the matrix could run in parallel on different CPUs, thus decreasing the time it takes for the program to complete.   Figure 2-9: Improving performance with parallel processing  Although our matrix-multiply program is a gross simplification of these kinds of programs, it is still a useful example of the benefits of threading. Our program will have small fixed-sized in-memory arrays; it has no I/O, so we won't be demonstrating overlapping I/O in this case. Also, the computation time for an element is so short in comparison to the setup time and overhead of a thread that, even if you ran it on a multiprocessing system, you might not notice a performance improvement. The Serial Matrix-Multiply Program Before we develop a threaded version of this program, let's look at the serial version in Example 2-9. Example 2-9: Serial Matrix-Multiply Program (matrix_serial.c) #include
 #define ARRAY_SIZE 10
 typedef int matrix_t[ARRAY_SIZE][ARRAY_SIZE];
 matrix_t MA,MB,MC;
 /* Routine to multiply a row by a column and place element in the result matrix. */
 void mult(int size,             /* size of the matrix */
 int row,              /* row of result to compute */
 int column,           /* column of result to compute */
 matrix_t MA,          /* input matrix */
 matrix_t MB,          /* input matrix */
 matrix_t MC) {       /* result matrix */
 int position;
 MC[row][column] = 0;
 for(position = 0; position < size; position++) {
 MC[row][column] = MC[row][column] +
 ( MA[row][position]  *  MB[position][column] ) ;
 }
 }
 /* Main: allocates matrix, assigns values, computes the results */
 extern int
 main(void) {
 int size = ARRAY_SIZE, row, column;
 /* Fill in matrix values */
 .
 .
 .
 /* Process matrix, by row, column */
 for(row = 0; row < size; row++)     {
 for (column = 0; column < size; column++) {
 mult(size, row, column, MA, MB, MC);
 }
 }
 /* Print matrix */
 printf("MATRIX: The resulting matrix C is:\n");
 for(row = 0; row < size; row ++) {
 for (column = 0; column < size; column++) {
 printf("%5d ",MC[row][column]);
 }
 printf("\n");
 }
 return 0
 } The arrays are named MA, MB, and MC (MA x MB = MC). The mult routine computes the result for an individual element in MC by multiplying the proper elements of MA by MB and adds the products. In the main program, a loop calls this routine for each element of MC. The Multithreaded Matrix-Multiply Program For the threaded version in Example 2-10, we'll use the peer model to organize the program's threads. We'll create a peer thread for each individual element in the result array MC and assign it to compute the result. A main thread will also existnot so much as a peer thread but as a setup and cleanup thread. It performs all of the setup tasks for the program, creates the peer threads, and waits for them to complete. When they do, the main thread prints the results and terminates the program.* * This design might cause a problem on some systems when the number of threads that must be created to handle a very large matrix swamp the system. A more sophisticated solution would be to limit the number of created threads based on the number of available CPUs. Example 2-10: Multithreaded Matrix-Multiply Program main Routine /* main: allocates matrix, assigns values, computes the results */
 .
 .
 .
 typedef struct {
 int       id;
 int       size;
 int       row;
 int       column;
 matrix_t  *MA,
 matrix_t  *MB,
 matrix_t  *MC;
 } matrix_work_order_t;
 .
 .
 extern int
 main(void) {
 int size = ARRAY_SIZE, row, column;
 matrix_t MA,MB,MC;
 matrix_work_order_t *work_orderp;
 .
 .
 .
 /* Process Matrix, by row, column */
 for(row = 0; row < size; row++)     {
 for (column = 0; column < size; column++) {
 id = column + row*10;
 work_orderp =
 (work_order_t *)malloc(sizeof(matrix_work_order_t));
 work_orderp->id = id;
 work_orderp->size = size;
 work_orderp->row = row;
 work_orderp->column = column;
 work_orderp->MA = &MA;
 work_orderp->MB = &MB;
 work_orderp->MC = &MC;
 (void *)work_orderp);
 }
 }
 /* Wait for peers to exit */
 for (i = 0; i < (size * size); i++) {
 }
 .
 .
 return 0;
 } In the serial version of our matrix-multiply program (Example 2-9), the main routine made a procedure call to invoke the mult routine. In the multithreaded version (Example 2-10), the main routine creates a peer thread to do the job. There is one complication, thoughthe mult routine as used in the serial version has many arguments, but the pthread_create function lets threads start only in routines that are passed a single argument. We'll explain the solution in the next section. Passing data to a new thread This limitation of pthread_create is annoying, but there is a standard solution that we employ in Example 2-11. We bundle everything the main routine wants to pass to its peer threads into a single structure. We call this structure the matrix_work_order_t, and it contains fields for all of the arguments passed to the serial program's mult routine. Our main routine passes each peer thread a pointer to a matrix_work_order_t structure as the last argument in the pthread_create call. A common error is not passing the new thread a work order structure that is unique. You may have noticed that our ATM and matrix-multiply programs allocate the data they intend to pass to their threads by using a malloc call just prior to the pthread_create call. If instead they used a static structure, or placed the malloc outside of the "for" loop, the main would continuously overwrite the contents of the same structure, and all threads would see values that were intended only for the most recently created thread. Using the matrix_work_order_t structure lets the main routine bundle various pieces of information into a single pthread_create argument, but the thread's start routine must accept only a single argument. It would be nice to reuse our mult routine as a start routine, but its multiple arguments make that impossible. This is common occurrence when trying to use legacy code with threads. Here too we'll use a standard solution. We'll define a new start routine for the peer threads. It'll be a simple wrapper over the preexisting mult routine. The peer_mult routine takes the pointer to the matrix_work_order_t structure that was passed in through pthread_createand uses the information from the structure to call the mult routine. Example: 2-11: Multithreaded Matrix-Multiply Program peer_multi Routine /*
 * Routine to start off a peer thread
 */
 void peer_mult(matrix_work_order_t *work_orderp)
 {
 mult(work_orderp->size,
 work_orderp->row,
 work_orderp->column,
 *(work_orderp->MA),
 *(work_orderp->MB),
 *(work_orderp->MC));
 free(work_orderp);
 } Synchronization in the matrix-multiply program Our multithreaded matrix-multiply example doesn't need much unusual synchronization:  The main thread must wait for the peers to complete. It uses pthread_join to do so.  No data synchronization is required because the peers never write to any shared locations.  Threads only read the values in the input arrays; we don't have to worry about synchronizing access because someone may change those values.  The computation of each element in the result array is completely independent of the results for any other element in the result array. We don't need to be concerned about the order in which threads complete the computation of their elements. Because thread programmers are rarely this lucky, we need to turn the page to Chapter 3.    Books24x7.com, Inc © 2000   Feedback