Add Book to My BookshelfPurchase This Book Online

Chapter 4 - Managing Pthreads

Pthreads Programming
Bradford Nichols, Dick Buttlar and Jacqueline Proulx Farrell
 Copyright © 1996 O'Reilly & Associates, Inc.

Keys: Using Thread-Specific Data
As a thread calls and returns from one routine or another, the local data on its stack comes and goes. To maintain long-lived data associated with a thread, we normally have two options:
 •Pass the data as an argument to each call the thread makes.
 •Store the data in a global variable associated with the thread.
These are perfectly good ways of preserving some types of data for the lifetime of a thread. However, in some instances, neither solution would work. Consider what might happen if you're rewriting a library of related routines to support multithreading. Most likely you don't have the option of redefining the library's call arguments. Because you don't necessarily know at compile time how many threads will be making library calls, it's very difficult to define an adequate number of global variables with the right amount of storage. Fortunately, the Pthreads standard provides a clever way of maintaining thread-specific data in such cases.
Pthreads bases its implementation of thread-specific data on the concept of a key—a kind of pointer that associates data with a specific thread. Although all threads refer to the same key, each thread associates the key with different data. This magic is accomplished by the threads library, which stores the pointer to data on a per-thread basis and keeps track of which item of data is associated with each thread.
Suppose you were writing a communication module that allowed you to open a connection to another host name and read and write across it. A single-threaded version might look like Example 4-10.
Example 4-10: A Communications Module (specific.c)
static int cur_conn;
int open_connection(char *host)
{;
       .
       .
       .
       cur_conn = ....
       .
       .
       .
}
int send_data(char *data)
{;
       .
       .
       .
       write(cur_conn,...)
       .
       .
       .
}
int receive_data(char **data)
{;
       .
       .
       .
       read(cur_conn,...)
       .
       .
       .
}
We've made the static variable cur_conn internal to this module. It stores the connection identifier between calls to send and receive data. When we add multiple threads to this module, we'll probably want them to communicate concurrently with the same or different hosts. As written, though, this module would have a rather surprising side effect for the thread that first opens a connection and starts to use it. Each subsequent open_connection call will reset the stored connection (cur_conn) in all threads!
If we couldn't use thread-specific data with keys, we'd still have a few ways of fixing this problem:
 •Add the connection identifier as an output argument to the open_connection call and as an input argument to the receive_data and send_data calls.
Although this would certainly work, it's a rather awkward solution for a couple of reasons. First, it forces each routine that currently uses the module to change as well. Any routine that makes calls to the module must store the connection identifier it receives from the open_connection call so it can use it in subsequent receive_data and send_data calls. Second, the connection variable is just an arbitrary value with meaning only within the module. As such, it should naturally be hidden within the module. If we did not force its use as a parameter to our module's interfaces, the caller would otherwise never reference it. It shouldn't even need to know about it.
 •Add an array (cur_conn) that contains entries for multiple connections.
This alone would not work, because the current version of our module has no way of returning to the caller of open_connection the index of the array entry at which it stored the connection identifier. We could proceed to add an argument to open_connection,receive_data, and send_data to pass back and forth an index into the cur_connarray, but that leads to the same disadvantages as our first solution. Furthermore, we don't know how much space to allocate for the array because the number of threads making connections can vary during the run of the program.
Now we can see more clearly the advantages of using thread-specific data. This way, our module can use a key to point to the connection identifier. We need no new arguments in the calls to the module. Each time a thread calls one of the routines in our module, our code uses the key to obtain its own particular connection identifier value.
Certain applications also use thread-specific data with keys to associate special properties with a thread in one routine and then retrieve them in another. Some examples include:
 •A resource management module (such as a memory manager or a file manager) could use a key to point to a record of the resources that have been allocated for a given thread. When the thread makes a call to allocate more resources, the module uses the key to retrieve the thread's record and process its request.
 •A performance statistics module for threads could use a key to point to a location where it saves the starting time for a calling thread.
 •A debugging module that maintains mutex statistics could use a key to point to a per-thread count of mutex locks and unlocks.
 •A thread-specific exception-handling module, when servicing a try call (which starts execution of the normal code path), could use a key to point to a location to which to jump in case the thread encounters an exception. The occurrence of an exception triggers a catch call to the module. The module checks the key to determine where to unwind the thread's execution.
 •A random number generation module could use a key to point to a location where it maintains a unique seed value and number stream for each thread that calls it to obtain random numbers.
These examples share some common characteristics:
 •They are libraries with internal state.
 •They don't require their callers to provide context in interface arguments. They don't burden the caller with maintaining this type of context in the global environment.
 •In a nonthreaded environment, the data to which the key refers would normally be stored as static data.
Note that thread-specific data is not a distinct data section like global, heap, and stack. It offers no special system protection or performance guarantees; it's as private or shared as other data in the same data section. There are no special advantages to using thread-specific data if you aren't writing a library and if you know exactly how many threads will be in your program at a given time. If this is the case, just allocate a global array with an element for each known thread and store each thread's data in a separate element.
Initializing a Key: pthread_key_create
Let's rewrite our ATM server's communication module so that it uses a key to point to the connection information for each thread. When a thread calls the open_connection routine,the routine will store the thread-specific connection identifier using a key. We'll initialize the key, as shown in Example 4-11.
Example 4-11: A Communication Module Using Keys (specific.c)
#include <pthread.h>
static pthread_key_t conn_key;
int init_comm(void)
{
       .
       .
       .
       pthread_key_create(&conn_key, (void *)free_conn);
       .
       .
       .
}
void free_conn(int *connp)
{;
       free(connp);
}
We've defined conn_key, the key we're using to point to the thread-specific connection identifier, as a static variable within the module. We initialize it by calling pthread_key_create in the init_comm routine. The pthread_key_create call takes two arguments: the key and a destructor routine. The library uses the destructor routine to clean up the data stored in the key when a thread stores a new value in the key or exits. We'll discuss destructor routines some more in a moment.
When you're done with a key, call pthread_key_delete to allow the library to recover resources associated with the key itself.
Although the pthread_key_create function initializes a key that threads can use, it neither allocates memory for the data to be associated with the key, nor associates the data to the key. Next we'll show you how to handle the actual data.
Associating Data with a Key
The chief trick to using keys is that you must never assign a value directly to a key, nor can you use a key itself in an expression. You must always use pthread_setspecific and pthread_getspecific to refer to any data item that is being managed by a key. In Example 4-12, our communication module's open_connection routine calls pthread_setspecific to associate the conn_key key with a thread-specific pointer to an integer.
Example 4-12: Storing Data in a Key (specific.c)
int open_connection(char *host)
{
       int *connp;
       .
       .
       .
       connp = (int *)malloc(sizeof(int));
       *connp = ...
       pthread_setspecific(conn_key, (void *)connp);
       .
       .
       .
}
When a thread calls the open_connection routine, the routine calls malloc to allocate storage for an integer on the heap and sets the pointer connp to point at it. The routine then uses connp to set up a connection and store the connection identifier. Once the connection is complete, the pthread_setspecific call stores connp in a thread-specific location associated with conn_key.
The pthread_setspecific routine takes, as an argument, a pointer to the data to be associated with the key—not the data itself. Figure 4-2 shows what the conn_key key would look like after the first thread used it to store its thread-specific value.
Figure 4-2: A key after a value is set
Figure 4-3: A second value stored in the key
The open_connection routine, executing in Thread1's context, pushes the connp variable onto the thread's stack. After the call to malloc, connp points to storage for an integer in the heap section of the process. The detailed communication code then uses the connp pointer to set the value of the connection identifier to 15. Once the connection is set up, the pthread_setspecific call stores the pointer to the allocated heap storage for this thread with the conn_key key. When Thread 1 returns from its open_connection procedure call, its stack frame for the procedure call is deal located, including its connp pointer. The only place in which a pointer to Thread1's connection identifier remains is within the key.
When another thread calls open_connection, as shown in Figure 4-3, the process is repeated.
Now Thread 2 has a stack frame for its open_connection procedure call. After the call to malloc, connp points to storage for an integer in a different area of the process's heap section. The detailed communications code comes up with a different connection identifier for Thread 2, but the pthread_setspecific call stores a pointer to this value, 22, in the very same key as it stored a pointer to Thread 1's connection identifier. When Thread 2 returns from its open_connection procedure call, its stack frame for the procedure call is deallocated, including its connp pointer. The only place in which a pointer to Thread2's connection identifier remains is within the key.
Retrieving Data from a Key
The send_data and receive_data routines call pthread_getspecific to retrieve the connection identifier for the calling thread. Each routine uses a pointer, saved_connp, to point to the connection identifier, as shown in Example 4-13.
Example 4-13: Retrieving Data from a Key (specific.c)
int send_data(char *data)
{;
       int *saved_connp;
       .
       .
       .
       pthread_getspecific(conn_key, (void **)&saved_connp);
       write(*saved_connp,...);
       .
       .
       .
}
int receive_data(char **data)
{;
       int *saved_connp;
       .
       .
       .
       saved_connp = pthread_getspecific(conn_key);
       read(*saved_connp,...)
       .
       .
       .
}
When Thread 1 calls the send_data or receive_data routine, as shown in Figure 4-4,the routine calls pthread_getspecific to return to saved_connp the thread-specific connection identifier associated with the conn_key key. It now has access to its connection identifier (15) and can write or read across the connection. When the second thread calls send_data or receive_data, it likewise retrieves its connection identifier (22) using the key.
Figure 4-4: Retrieving a stored value from a key
The pthread_getspecific function returns NULL if no value has been associated with a key. If a thread received a NULL return value from its call to receive_dataor send_data, it's likely that it neglected to make a prior call to open_connection.
Destructors
We've shown that keys often store pointers to thread-specific data that's been allocated on the heap. Memory leaks can occur when threads exit and leave their thread-specific data that was associated with keys. For this reason we must specify a destructor routine, or destructor for short, when we create a key. When a thread exits, the library invokes the destructor on the thread's behalf, passing to it the pointer to the thread-specific data currently associated with the key. In this manner, the destructor acts as a convenient plug for potential memory leaks, deallocating memory that would otherwise be forgotten and go to waste.
The destructor can be any routine you choose. In our init_comm routine shown in Example 4-11, we used a routine named free_conn. For the simple integer being stored, free_conn could have simply consisted of a free system call. If we were using more complex data, such as a linked list, the destructor would be a more complex routine that walked down the list, freeing each node. An even more complex example would be a data structure that includes handles on system resources, such as sockets and files, that the destructor would need to close.

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback