Add Book to My BookshelfPurchase This Book Online

Chapter 5 - Pthreads and UNIX

Pthreads Programming
Bradford Nichols, Dick Buttlar and Jacqueline Proulx Farrell
 Copyright © 1996 O'Reilly & Associates, Inc.

Threads and Process Management
On a Pthreads-compliant system, calls that manipulate processes, like fork and exec, still behave in the way they always have for nonthreaded programs. Let's see what happens when we make these calls from a multithreaded process. 
Calling fork from a Thread
A process creates another process by issuing a fork call. The newly created child process has a new process ID but starts with the same memory image and state as its parent. At its birth it's an exact clone of its parent, starting execution at the point of its parent's fork call in the same program. Often, the new process immediately calls exec to replace its parent's program with a new program. It then sets out on its own business. 
In a Pthreads-compliant implementation, the fork call always creates a new child process with a single thread, regardless of how many threads its parent may have had at the time of the call. Furthermore, the child's thread is a replica of the thread in the parent that called fork—including a process address space shared by all of its parent's threads and its parent thread's per-thread stack. 
Consider the headaches: 
 The new single-threaded child process could inherit held locks from threads in the parent that don't exist in the child. It may have no idea what these locks mean, let alone realize that it holds one of them. Confusion and deadlock are in the forecast. 
 The child process could inherit heap areas that were allocated by threads in the parent that don't exist in the child. Here we see memory leaks, data loss, and bug reports. 
The Pthreads standard defines the pthread_atfork call to help you manage these problems. The pthread_atfork function allows a parent process to specify preparation and cleanup routines that parent and child processes run as part of the fork operation. Using these routines a parent or child process can manage the release and reacquisition of locks and resources before and after the fork
This is pretty complex stuff, so please bear with us. 
Fork-handling stacks
To perform its magic, the pthread_atfork call pushes addresses of preparation and cleanup routines on any of three fork-handling stacks: 
 Routines placed on the prepare stack are run in the parent before the fork
 Routines placed on the parent stack are run in the parent after the fork
 Routines placed on the child stack are run in the child after the fork
A single call to pthread_atfork places a routine on one or more of these stacks. With multiple calls you can place routines on any given stack in a first-in last-out order. Because the fork-handling stacks are a processwide resource, any thread—not just the one that will call fork—can push routines on them. 
In those carefree times when we throw caution to the winds and decide to fork from the middle of a multithread program, we typically use pthread_atfork to push mutex-locking calls on the prepare fork-handling stack and mutex-unlocking calls on the parent and child stacks. We might also place routines that release resources and reset variables on the child stack. 
Let's demonstrate what would happen if we did not use pthread_atfork's capabilities in one of those fork-crazy programs of ours. In Figure 5-1, we have two threads: a mutex (Lock L) and the data the mutex protects. Thread A acquires Lock L and starts to modify the data. Meanwhile, Thread B decides to fork. Now, the fork creates a child process that's a clone of its parent process, and this child shows a locked Lock L. The child process has a single thread, a replica of Thread B (the thread in the parent process that called fork). The assortment of clones and replicas that result from the fork has little effect on the threads in the parent process. However, things are not okay in the child. The locked Lock L is an utter mystery to the new Thread B in the child. If it tries to acquire Lock L, it will deadlock. (There's no Thread A in the child that will ever release Lock L in the child process's context.) If it tries to access the data without first obtaining Lock L, it may see the data in an inconsistent form. Life's never easy for our kids. 
Figure 5-1: Results of a fork when pthread_atfork is not used
Now, let's use pthread_atfork to control Lock L's state at the time of the fork. The program we show in Figure 5-2 also has Threads A and B, Lock L, and scrupulously guarded data. However, we've added an initialization routine that pushes a routine that locks L on the prepare fork-handling stack, and a routine that unlocks L on the child and parent fork-handling stacks. We've taken care to do this in a routine that executes before any thread actually uses the lock. 
Figure 5-2: Results of a fork when pthread_atfork is used
Sometime later, Thread A acquires the lock and starts to modify the data. When Thread B calls fork, the routine on the prepare stack runs in Thread B's context. This routine tries to obtain Lock L and will block; Lock L is still held by Thread A. Ultimately, the fork is delayed until Thread A releases Lock L. When this happens, the prepare routine succeeds, Thread B will become the owner of the lock, and the fork proceeds. As expected, a child process is created that's a replica of its parent. However, inthis case, the newly cloned Thread B in the child knows about the locked lock it finds in the child's context. At this point, the routine we placed on the child fork-handling stack runs and releases Lock L. The same routine runs from the parent fork-handling stack and releases the lock in the parent process. When the dust settles, the lock is unowned in both parent and child, and the data it protects is in a consistent state. Who could ask for more? 
Even given the capabilities of pthread_atfork, forking from a multithreaded program is no picnic. We kept our example simple. Imagine having to track every lock and every resource that may be held by every thread in your program and in every library call it makes! Before pursuing this course, you should consider a less complex alternative: 
 If possible, fork before you've created any threads. 
Instead of forking, create a new thread. If you are forking to exec a binary image, can you convert the image to a callable shared library to which you could simply link? 
 Consider the surrogate parent model. 
In the surrogate parent model, a program forks a child process at initialization time. The sole purpose of the child is to serve as a sort of "surrogate parent" for the original process should it ever need to fork another child. After initialization, the original parent can proceed to create its additional threads. When it wants to exec an image, it communicates this to its child (which has remained single-threaded). The child then performs the fork and exec on behalf of the original process. 
Calling exec from a Thread
An exec call changes the program image of a process. For instance, using an exec, a process running the shell program can switch to the vi editor program. After the exec the identity of the process remains the same (that is, it has the same process ID, user ID, etc.), but its virtual memory image is completely new and based on the program it has been asked to run. 
If a thread in a multithreaded process issues an exec, we'd expect that thread to start in the main routine of the new image. And this is essentially what happens. But what of the other threads? It wouldn't be of much use if the system loader picked some random routine entry points for them to start in. With this in mind, the Pthreads standard specifies that an exec call from any thread must terminate all threads in the process and start a single new thread at main in the new image. 
Process Exit and Threads
Regardless of whether or not a process contains multiple threads, it can be terminated when: 
 Any thread in it makes an exit system call. 
 The thread running the main routine completes its execution. 
 A fatal signal is delivered. 
When a process exits, all threads in it die immediately, and their resources are released. (If you call _exit directly, the system doesn't guarantee the cleanup.) 

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback