Add Book to My BookshelfPurchase This Book Online

Chapter 5 - Pthreads and UNIX

Pthreads Programming
Bradford Nichols, Dick Buttlar and Jacqueline Proulx Farrell
 Copyright © 1996 O'Reilly & Associates, Inc.

Threadsafe Library Functions and System Calls
Up to this point we've spent a lot of effort to ensure that multiple threads can execute cleanly and efficiently in our own code. However, it's easy to forget that our applications spend a lot of time in system-supplied libraries (and third-party-supplied libraries), running code over which we have no control whatsoever. If the library fails to recognize potential race conditions when its data is shared among threads and neglects to enforce appropriate synchronization, our program will fail—just as if it ignored these issues itself! 
This problem isn't an issue just for threaded programs. Race conditions can also occur in traditional, single-threaded programs that use signal handlers or that call routines recursively. A single-threaded program of this kind may have the same routine in progress in various call frames on its process stack. 
Threadsafe and Reentrant Functions
The degree to which a library function or routine allows itself to have multiple instances of itself in progress at the same time is known as its reentrancy. The behavior of a reentrant function doesn't vary whether one call or multiple calls to it are in progress. For multiple, simultaneous function calls to work properly, a function cannot write to static data. If it does, it creates a race condition with regard to the data, and its callers risk obtaining bad results. 
The Pthreads standard not only requires that almost all system-supplied library functions be reentrant but also requires them to be threadsafe. A threadsafe function has been designed to allow multiple, simultaneous calls specifically from threads. Whereas the normal mechanism for making a function reentrant is to remove all references to global data, a threadsafe function can employ thread synchronization primitives (such as mutexes and condition variables) to protect the global data. 
Example of Thread-Unsafe and Threadsafe Versions of the Same Function
We'll show the behavior of a function that disregards the basic rules of thread safety in Example 5-5. Although the example is contrived and oversimplified, it does illustrate how certain functions were designed on many systems before Pthreads support was added. Although it may be obvious to you that using a fixed-length global buffer in a callable library is bad programming style, there are a couple of important lessons to be learned here: 
 What may be bad programming style in a library called by different processes will be deadly to threads calling the library from the same process. 
 It's a big, wonderful, and sometimes dangerous world out there! Know the types of libraries your threads hang around in! 
In Example 5-5, our thread-unsafe function is called reverse_string. It uses a static buffer (my_buffer) as a temporary workspace while it reverses the order of the characters in an input string. 
Example 5-5: A Thread-Unsafe String Reversing Routine (reverse_string.c)
static char work_buffer[100];
void reverse_string(in_str)
char *in_str;
{
int size = 0, i = 0, j = 0;
/* Find the end of the in_str */
while ( (in_str[size] != ‘\0') && (size != 100)) {
            size++;
  }
/* Copy from in_str into buffer, reversing it */
for (i = size-1; i > -1; i--) {
            work_buffer[j++] = in_str[i];
  }
work_buffer[j] = ‘\0';
/* Copy back from buffer to in_str */
for (i = 0; i < size+1; i++)
            in_str[i] = work_buffer[i];
}
Here's how a race condition develops when two threads call reverse_string at the same time: 
 1.Thread A calls the reverse_string function with the input string "the cat". The scheduler preempts the thread at the point at which the function has copied "tac e" into work_buffer
 2.Thread B now calls reverse_string with the input string "dog house". The function writes "esuoh god" into work_buffer and returns. 
 3.When Thread A continues, reverse_string continues copying "the cat" into work_buffer. When it completes, it returns the string "esuohht" instead of the correct string "tac eht". 
The problem with reverse_string does not lie with the indexes it uses; the indexes are automatic data that is maintained by each individual thread. The problem is in the static array work_buffer
We can easily make reverse_string threadsafe by the few keystrokes it takes to move the my_buffer array from the static variable area to the automatic variable area in Example 5-6. 
Example 5-6: A Threadsafe String Reversing Routine (reverse_string.c)
void reverse_string(in_str)
char *in_str;
{
int size = 0, i = 0, j = 0;
char my_buffer[100];
/* Find the end of the in_str */
while ( (in_str[size] != &lsquo;\0') && (size != 100)) {
            size++;
  }
/* Copy from in_str into buffer, reversing it */
for (i = size-1; i > -1; i--) {
            my_buffer[j++] = in_str[i];
  }
  .
  .
  .
}
When calling this version of reverse_string, each thread gets its own copy of my_buffer on its own per-thread stack. The danger of corruption is removed because no other thread can access the buffer. 
Functions That Return Pointers to Static Data
Notice how transparent our solution is to the race condition in reverse_string. Because we didn't add or change any parameters, its callers, Thread A and Thread B, don't need to change—unless they depended on the previously incorrect results! Unfortunately, for other thread-unsafe functions, there isn't such a simple and tidy solution. What if the function call's interface includes in its argument list a return pointer to static data? Its callers are bound to this interface (and, for many of them—the single-threaded callers—it works fine). Moreover, this type of interface is not uncommon. You often find it in functions that cache information (such as directory listings, host names, or times). It's often easier and quicker to return a pointer to the static results than to copy information into a caller-specified buffer. 
The only way to produce a threadsafe version of this type of function is to change its argument list. Regrettably, the threadsafe version will no longer be compatible with the previous version, thus causing some amount of inconvenience to its callers. 
Library Use of errno
In Chapter 1, Why Threads?, we pointed out that Pthreads library functions do not use errno. However, traditional UNIX and POSIX.1 system calls (such as read and write) do use errno, and this could be a big problem for multithreaded programs that call these functions. 
When a program makes an unsuccessful call to one of these functions, the system sets the value of the global int variable errno to an error number. The programmer first tests the function's return value to see if an error has occurred and then reads errno to determine why. Typically, the libc function perror is used to decode the error and print an explanatory string to standard error. 
The following snippets of code show two threads making an unsuccessful system call at the same time. 
Thread 1                               Thread 2
amt=read(...);                         rtn=ioctl(...);
if (amt<0) {                           if (rtn<0) {
fprintf(stderr, "error       fprintf(stderr, "error 
read( ) %d",  ioctl( ) %d",
       errno)                                errno);
exit(-1);                              exit(-1);
}                                      }
Because there is only one errno global variable for the entire process, the failing read call and the failing ioctl call encounter a race condition when they write to it. Consequently, when Thread 1 reads and prints out the value of errno, it can't tell whether the error value is the result of its read call or Thread 2's ioctl call. 
The Pthreads standard recognizes this problem and dictates that each thread must perceive errno as having a thread-private value, independent of the errno values seen by other threads. To achieve this, Pthreads library implementations define the string "errno" as a macro. When expanded, this macro returns a thread-specific errno value. Thus, existing error-checking code doesn't need to change to work within a thread. In fact, our examples would work, too. 
The Pthreads Standard Specifies Which Functions Must Be Threadsafe
One of the most time-consuming aspects of deploying Pthreads for many system vendors is the effort required to make their libraries and system calls threadsafe. The Pthreads standard dictates that almost all POSIX.1 calls must be threadsafe. (Note that the POSIX.1 calls include not only library functions but also system calls such as dup, chmod, getpid, and open, and C language bindings such as atoi, malloc, printf, and scanf.) 
The small number of exceptions allowed by the standard include: 
 Calls whose argument lists include static data 
 Calls for which performance is a concern 
 Calls that involve file locking 
Additionally, a vendor may make certain of its non-POSIX calls threadsafe. Before using any non-POSIX interface in a multithreaded application, ensure that it is threadsafe by checking your system's documentation. 
Alternative interfaces for functions that return static data
The POSIX.1 readdir and localtime functions are good examples of the sort of function that returns to its caller a pointer to static data (either a structure or string). Each time you call one of these functions, it overwrites its static data area. In nonthreaded applications, this means you need to use the returned pointer to copy the data somewhere else; this may be annoying, but it does not prevent the call from returning correct results. However, when you call one of these routines from multiple threads at the same time, it will return corrupt results. 
We've already mentioned that this type of function can be made threadsafe only by some visible change to its call interface. The major drawback to this is that we'd break a lot of programs if we change the existing interface of a call like readdir or localtime
The solution the Pthreads standard adopts for functions like these is to leave the existing functions alone and create new, alternative versions of the functions that are threadsafe. In the new threadsafe functions (whose names end in _r for reentrant), the caller provides a pointer to a buffer to which the function copies its results. Each time a thread calls the threadsafe version of one of these functions, it maintains the data unique to its call in its own buffer. 
The Pthreads standard defines the following threadsafe versions of existing POSIX.1 functions: 
asctime_r
ctime_r
getgrgid_r
getgrnam_r
getlogin_r
getpwnam_r
getpwuid_r
gmtime_r
localtime_r
rand_r
readdir_r
strtok_r
ttyname_r
Additional routines for performance considerations
The getc, getchar, putc, and putchar functions are commonly used POSIX.1 functions that perform I/O to standard input and output. Because of the frequency with which certain applications call them, the Pthreads standard committee determined that making these functions threadsafe would result in serious performance hits for some single-threaded applications (which don't need the extra thread-specific synchronization). As a result, it decreed that, while vendors should make the existing functions threadsafe, they should also offer new versions of the functions that provide better performance.
The new, thread-unsafe, better-performing versions of the getc, getchar, putc, and putchar functions are getc_unlocked, getchar_unlocked, putc_unlocked, and putchar_unlocked.
File-locking functions for threads
It's fairly common for a thread to read and write to a shared file. Although POSIX.1 defines calls (such as flock) hat allow a thread to synchronize access to a file shared with another process, it did not define calls that allowed multiple threads within the same process to synchronize similar activity. A thread that calls flock effectively locks a file against access by threads from other processes but leaves it wide open to other threads in its own process.
To synchronize access to a file shared with threads from the same process as well as those of other processes, a thread could use a mutex in conjunction with its flock calls. However, the Pthreads standard defines functions, listed in Table 5-2, that give you this degree of synchronization with a lot less effort.
Table 5-2: New Routines for Thread-Specific File Locking
Function
Description
Flockfile
Locks a file on a per-thread basis
Ftrylockfile
Tries to lock a file on a per-thread basis (returns immediately)
Funlockfile
Unlocks a file on a per-thread basis
Where are the threadsafe functions?
The Pthreads standard specifies that the threadsafe versions of most POSIX.1 functions must be available on your platform, but where?
Here, too, you should consult your operating system's documentation. Some systems may support the new threadsafe versions of standard functions in one library while continuing to support the thread-unsafe versions in another.* These systems may keep the original functions in a standard library (named lib xxx. a) and the threadsafe functions in a new library, lib xxx_r.a.
 *There are a number of reasons the thread-unsafe libraries may still be available, including performance (the traditional functions may be faster than the threadsafe ones) and quality (the threadsafe functions may not have been tested as much as the traditional ones).
Using Thread-Unsafe Functions in a Multithreaded Program
Speaking of safety, if you are intent on walking on the sea wall during high tide, make sure you do so only when the wind has stopped and you're wearing your good sneakers—and stay away from those rocks! Similarly, if you are determined that your multithreaded program needs the unique functionality of a system library or toolkit that is thread-unsafe, you can still use it in your multithreaded application. However, if you do, you must treat the entire function call as if it were a shared resource and use appropriate synchronization. 
The simplest synchronization scheme is to allow only one thread in your program to make calls using the thread-unsafe interface. A little more complex solution would be to associate a mutex or a condition variable with some or all of the interface calls. Any thread in your program must lock the appropriate mutex before calling the thread-unsafe function it protects. 
For example, assume that we failed to make reverse_string unsafe. In Example 5-7, we'll insert some code in a multithreaded program that calls it, surrounding the reverse_string call with calls to lock and unlock a reverse_string_mutex lock and defining a macro that will invoke this whole block of code. Now any thread in our program can use the safe_reverse_string macro to launch a thread-safe call to the thread-unsafe reverse_string function. 
Example 5-7: Using a Mutex with a Thread-Unsafe Interface (reverse_string.c)
pthread_mutex_t reverse_string_mutex
#define safe_reverse_string(x) \
pthread_mutex_lock(&reverse_string_mutex); \
reverse_string(x); \
pthread_mutex_unlock(&reverse_string_mutex);

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback