Add Book to My BookshelfPurchase This Book Online

Chapter 3 - Synchronizing Pthreads

Pthreads Programming
Bradford Nichols, Dick Buttlar and Jacqueline Proulx Farrell
 Copyright © 1996 O'Reilly & Associates, Inc.

Mutex Variables
To protect a shared resource from a race condition, we use a type of synchronization called mutual exclusion, or mutex for short. Using mutexes, we give threads turns at having exclusive access to data. When one thread has exclusive access to data, other threads cannot simultaneously be accessing the same data.
So far, we've focused almost entirely on providing exclusive access to data. However, we could take a different perspective and provide exclusive access to the code paths or routines that access data. We call that piece of code that must be executed atomically a critical section.
How large does a critical section have to be to require protection through a mutex? Not very large at all—even a single statement might need to be guarded by a mutex. To answer this question for your program, it's important for you to understand something about what your C language statements might look like at an instruction level. Where your C language program might have a single assignment statement, the compiler might substitute a number of machine instructions operating on one or more memory locations. In a multithreaded environment, the original single statement is no longer atomic at the hardware level. For example:
 Double-precision floating-point multiplies and adds on many systems require multiple loads and stores.
 A platform may have alignment restrictions that cause an integer to be accessed by multiple loads or stores when it straddles an alignment boundary.
Be conservative. Because a platform's machine architecture ultimately decides which operations are performed atomically and which are not, you should always use mutexes to ensure a thread's shared data operations are atomic with respect to other threads. For the time being, you can assume that the Pthreads standard arranges things so that Pthreads library operations (such as mutex locks and unlocks) work properly regardless of the platform you are using and the number of CPUs in the system. We'll provide enough background on this topic in Chapter 5, to make you confident that this is so.
Using mutex variables in Pthreads is quite simple. Here's what you do:
 1.Create and initialize a mutex for each resource you want to protect, like a record in a database.
 2.When a thread must access the resource, use pthread_mutex_lock to lock the resource's mutex. The Pthreads library makes sure that only one thread at a time can lock the mutex; all other calls to the pthread_mutex_lock function for the same mutex must wait until the thread currently holding the mutex releases it.
 3.When the thread is finished with the resource, unlock the mutex by calling pthread_mutex_unlock.
It's up to you to put lock and unlock calls in the right place. Unlike some higher-level programming interfaces, the Pthreads library does not enforce locks. Pthreads locks are merely advisory. If each thread locks the mutex when it's supposed to, the system works; if each thread does what it feels like, the data goes unprotected. If your locking code is correct, the thread that holds a lock on a mutex can assume that:
 No other thread will write to the data. Data protected by the mutex will not change out from under it.
This is important because a thread may take some action based on the current value of the data. For instance, the ATM example allows a withdrawal whenever the bank balance is greater than the amount to be withdrawn. You certainly wouldn't want a thread to come in and decrease the balance while another thread is giving out the money.
 No other thread will read the data while it is in some sort of intermediate state. After this thread releases the lock, other threads will see only the final data it has written.
This is important because a thread might need many steps to process the data. The only way to make the data appear atomic to other threads is to prevent them from seeing its intermediate states.
You can use a single mutex lock in our ATM server example to protect the account database from corruption. We'll globally define the mutex and call it global_data_mutex. Our server's main routine will statically initialize global_data_mutex before it creates any worker threads:
pthread_mutex_t global_data_mutex = PTHREAD_MUTEX_INITIALIZER;
Once it's initialized, the mutex can be used by any worker thread accessing the database, such as the threads that run the deposit routine in Example 3-1.
Example 3-1: Using a Single Mutex Lock for the ATM Database (atm_svr.c)
void deposit(char *req_buf, char *resp_buf)
  int rtn;
  int temp, id, password, amount;
  account_t *accountp;
  /* Parse input string */
  sscanf(req_buf, "%d %d %d %d ", &temp, &id, &password, &amount);
  /* Check inputs */
  if ((id < 0) || (id >= MAX_NUM_ACCOUNTS)) {
    sprintf(resp_buf, "%d %s", TRANS_FAILURE, ERR_MSG_BAD_ACCOUNT);
  /* Retrieve account from database */
  if ((rtn = retrieve_account( id, &accountp)) != 0) {
    sprintf(resp_buf, "%d %s", TRANS_FAILURE, atm_err_tbl[-rtn]);
Although we've shown only the deposit routine in Example 3-1, all the routines in our server that access the database work in the same way. They lock the mutex before retrieving the account balance, then release it after they've changed the account balance. This may be the simplest solution for our ATM server, but it's not the best. We've limited access to the entire database to a single thread at a time, thus slowing performance considerably. Later in this chapter, as a performance enhancement, we'll add mutexes for individual account records to our server.
Using Mutexes
Mutex variables are of type pthread_mutex_t. Before you can use a mutex in your program, you must initialize it, either dynamically or statically. We previously showed an example of static initialization.
You dynamically initialize a mutex by calling pthread_mutex_init as shown in Example 3-2.
Example 3-2: Dynamically Initializing a Single Mutex Lock (atm_svr.c)
pthread_mutex_t *mutexp;
  mutexp=(pthread_mutex_t *)malloc(sizeof(pthread_mutex_t));
  pthread_mutex_init(mutexp, NULL);
When Pthreads initializes a mutex, it defines an attribute object for the mutex (pthread_mutex_attr_t) that you use to customize its behavior. To assign default attributes to a mutex, pass a NULL attribute argument in the pthread_mutex_init call. Unlike the pthread_attr_t object, which we introduced in our discussion of pthread_create in Chapter 1, the pthread_mutex_attr_t object has no mandatory attributes. We'll discuss its optional attributes—a process-shared attribute and two priority-inversion attributes (priority ceiling and priority inheritance)—a bit later.
Because you want to protect data from being accessed by more than one thread at a time, the main routine usually initializes all mutexes before it creates additional threads. Sometimes this is impractical—for instance, in a system library (it has no main!).When this is the case, use the pthread_once function, which we'll cover in Chapter 4.
Once you've initialized a mutex, you can lock it by calling pthread_mutex_lockor pthread_mutex_trylock. The pthread_mutex_lock call blocks the calling thread until it's granted the lock. If the mutex is unlocked at the time of the call, the lock's granted immediately; otherwise, it's granted after it's released by the thread that's holding it. We'll discuss pthread_mutex_trylock momentarily.
To release a lock, use pthread_mutex_unlock. If you should forget to call pthread_mutex_unlock for a locked mutex, a deadlock may occur in which other threads that are requesting the lock wait indefinitely for you to release it.
Error Detection and Return Values
The Pthreads standard allows implementations to define the exact level of error detection and reporting for some library calls. Although this allows vendors to design efficient library calls, it can pose a particular problem when you use mutex library calls.
In general, the Pthreads library reports all errors associated with resource availability and system constraints on function operation. For example, if the library realizes that it cannot initialize a mutex for a thread because the library itself hasn't enough space in its internal tables, it returns a value of EAGAIN or ENOMEM to the caller of pthread_mutex_init. However, the library does not have to detect improper uses of a mutex and report any errors that might result. Such improper uses include:
 Locking a mutex that you have not initialized
 Locking a mutex that you already own
 Unlocking a mutex that you don't own
Hopefully, the library you use does detect these misuses. If it does not in its default mode, see if it has a debug mode that provides additional error detection.
Using pthread_mutex_trylock
The pthread_mutex_trylock function, like pthread_mutex_lock, locks a previously initialized mutex. Unlike pthread_mutex_lock, though, it does not suspend its caller if another thread already holds the mutex. Instead, it returns immediately, indicating that the mutex is currently locked. The pthread_mutex_trylock function can be useful, but using it is not as simple as it seems.
Be careful.
Philosophically, using pthread_mutex_trylock seems contrary to the basics of multithreaded program design. We are callingpthread_mutex_trylock to prevent a thread from blocking, but we've designed threads into our program so that some threads could block while others continue. When we see a pthread_mutex_trylock call, we often wonder why the program's designer didn't simply create another thread for whatever it is that the thread might do while it would be waiting for the lock. This would make the program easier to understand rather than having the one thread, essentially assigned to more than one task, asynchronously bouncing between tasks based on the availability of locks.
Practically, using pthread_mutex_trylock represents a kind of polling for a resource—repeatedly trying and backing off until the resource is obtained. This polling leads to some overhead and, worse, potential resource starvation. If the lock is in high demand, the thread that polls for it may never get it. It's like trying to get tickets for a concert by a really hot band—Pink Floyd, for instance. The line forms well before the tickets go on sale and lasts until they are all gone. If you don't keep your place in line, you may never get your tickets. Similarly, a thread that is not patient enough to block and wait may never try the lock and find it available—there is always at least one other thread blocked waiting for the lock. Somewhat more acceptable is the specialized use of pthread_mutex_trylock by real-time programmers to poll for state changes. This practice may be inefficient, but it does allow real-time programs to respond quickly to a condition that warrants speed.
Another situation in which a pthread_mutex_trylock is often used is in detecting and avoiding deadlock in locking hierarchies and priority inversion situations. Later in this chapter, we'll discuss a more standard solution to locking hierarchy problems that involves defining an order in which any given thread must pursue locks. In Chapter 4, we'll discuss how you can avoid priority inversion problems by using attributes to assign priorities to mutexes.
When Other Tools Are Better
Mutexes are best used for controlling direct access to data and resources.  Although you can use mutexes as building blocks for more complex synchronization mechanisms, Pthreads often provides a more appropriate tool for doing so.
In particular, a common task in thread programming is event synchronization: each thread in a program reaches a certain point and must wait for other threads to get there. You might adopt this technique, for instance, when your threads are working on different chunks of an array and must exchange results at regular points. Your best choice to impose this type of synchronization is a condition variable. If condition variables were not available, you'd likely use a counter to let threads know when they've all reached a barrier in your program. Not only would each thread need to lock a mutex to decrement the counter, but it would also have to repeatedly lock the mutex to check if the counter had reached zero. If you find code that polls a counter to determine if all threads have synchronized on an event, it's time to use a condition variable. We'll have more to say about condition variables later in this chapter.
Some Shortcomings of Mutexes
Mutexes are the most restrictive type of access control. When a thread locks a mutex on a resource—even if it's only interested in checking the resource's value—it prevents all other threads from accessing the resource. This is effective synchronization for all situations but may not be the most efficient type of lock for situations that allow less restrictive access.
Sometimes you have many threads that read data but only an occasional thread that writes it. There should be a type of lock that allows any number of readers but works like a mutex whenever a writer enters the scene. That is, the writer should not be allowed access whenever any readers are using the data. But when a writer is using it, neither readers nor other writers are allowed in. Reader/writer locks provide this type of access control. Although Pthreads does not specify them, we'll show you later on how to "roll your own" using mutexes and condition variables.
In some circumstances, it would be useful if we could define a recursivelock: that is a lock that can be relocked any number of times by its current holder. It would be nice if we could specify this ability in a mutex attribute object. We can imagine the Pthreads library associating an internal counter with a recursive mutex to count the number of times its current holder has called pthread_mutex_lock. Each time the current holder calls pthread_mutex_unlock, the library would decrement this counter. The lock would not be released until the call that brings the count down to zero is issued.
A recursive mutex is useful for a thread that makes a number of nested calls to a routine that locks and manipulates a resource. You lock the mutex recursively each time the thread enters the routine and unlock it at all exit points. If the thread already holds the lock, the calls merely increase and decrease the recursive count and don't deadlock the thread. If you did not use a recursive mutex, you'd need to distinguish somehow between the times when the thread already holds the lock when it calls the routine and those when it needs to make a prior mutex lock call.
Contention for a Mutex
If more than one thread is waiting for a locked mutex, which thread is the first to be granted the lock once it's released? The choice is made according to the scheduling priorities of the individual threads.
The thread with the highest priority gets the lock. We'll discuss scheduling policies and priorities in Chapter 4. For now, it's worth noting that they allow you to mark one thread as more important than another.
Many threaded programs, however, don't assign different priorities to different threads. Most of these programs are designed for real-time applications and allow the choice of which thread gets a lock first to be made randomly.
The use of priorities in a multithreaded program can lead to a classic multiprocessing problem: priority inversion. Priority inversion involves a low priority thread that holds a lock that a higher priority thread wants. Because the higher priority thread cannot continue until the lower priority thread releases the lock, each thread is actually treated as if it had the inverse of its intended priority.
The best way to avoid priority inversion is to minimize the degree to which threads of different priorities share the same locks. This may not always be possible, though. In Chapter 4, we'll show you how to eliminate the risk of priority inversion by using mutex attributes.
Example: Using Mutexes in a Linked List
Linked lists are common structures in programming—and in programming books! But, when multiple threads become involved, there's a new twist: how do multiple threads access a list without screwing it up? In this version of the venerable linked-list example, we'll have multiple threads accessing a list, searching for a node (that is, reading the list), removing a node, and changing its contents (that is, writing to the list).Our include file for this example is shown in Example 3-3.
Example 3-3: Include File for a Linked List Module (llist.h)
/* llist.h */
typedef struct llist_node {
   int            index;
   void          *datap;
   struct llist_node    *nextp;
} llist_node_t;
typedef llist_node_t *llist_t;
int llist_init(llist_t *llistp);
int llist_insert_data(int index; void *datap, llist_t *llistp);
int llist_remove_data(int index; void **datapp, llist_t *llistp);
int llist_find_data(int index,  void **datapp, llist_t *llistp);
int llist_change_data(int index, void *datap, llist_t *llistp);
int llist_show(llist_t *llistp);
We've set up calls in our llist.h file to initialize a linked list of type llist_t, insert nodes, remove nodes, retrieve data from nodes, and set variables in nodes. In this simple example, each node has an integer index that indicates its place in the list. A partial implementation of the module including the initialization routine (llist_init) and the insert routine (llist_insert_data) is shown in Example 3-4.
Example 3-4: Nonthreaded Linked List Code (llist.c)
/* llist.c */
#include "llist.h";
/* Right now, this routine simply ensures that we don't initialize a list
   that has data on it. */
int llist_init(llist_t *llistp)
   if (*llistp == NULL)
          return 0;
          return -1;
int llist_insert_data(int index, void *datap, llist_t *llistp)
   llist_node_t *cur, *prev, *new;
   int found = FALSE;
   for (cur = prev = *llistp; cur != NULL; prev = cur, cur= cur->nextp) {
          if (cur->index == index) {
                    cur->datap = datap;
                    found = TRUE;
          } else if (cur->index > index) {
1   if (!found) {
2          new = (llist_node_t *)malloc(sizeof(llist_node_t));
3         new->index = index;
4          new->data = datap;
5          new->nextp = cur;
6          if (cur == list)
7                    *llistp = new;
8          else
                     prev->nextp = new;
   return 0;
As we've written it so far, our linked list code would present many opportunities for race conditions if we divided its tasks among multiple threads. If we allowed two or more threads to concurrently execute these routines, unexpected results might arise.
Complex Data Structures and Lock Granularity
When a linked list is being accessed by more than one thread, we're not only concerned about the data it contains, but also about the integrity of its structure. Consider a situation in which two threads are inserting nodes in our list at the same time, with the result that the execution of lines 1 through 8 in Example 3-4 are interleaved. Each thread has passed the test at line 6 and thus thinks that it should insert its node on the top of the list. When it executes line 7, each thread makes the head of the list point to the node it will be inserting. Whichever thread is the last to do so will succeed in inserting its node on the list; the other thread's node will be lost forever, occupying inaccessible memory somewhere on the heap. A similar mishap could result from the race condition in which a thread reads a node that is being concurrently removed by another thread.*
 *Our expectations on the structure and value of that data are sometimes called invariants. When we have a linked list, we expect it to remain well formed as long as our program is executing. It should always have a valid head and tail and correctly linked nodes that don't disappear—no matter which thread in our program references it.
Programmers often want to use a preexisting code library or module with a multithreaded program only to discover that to do so might create race conditions among the threads. We say that it contains non-threadsafe,or nonreentrant, code. In our linked list example, we'll rewrite the previous module to make it thread safe. In other cases, we may not have had the option of rewriting our code, perhaps because we were using a precompiled library of code. If this were the situation, we'd add a mutex as a wrapper around our calls to library functions. We'll discuss the issues that arise from using non-threadsafe code in Chapter 5.
Requirements and Goals for Synchronization
When designing the synchronization for our data structures, we'll keep to two strict requirements:
 Eliminate all race conditions.
 Do not introduce deadlock.
We'll try to meet these requirements with as little impact on the performance of our program as possible.
The lock potentially blocks other threads that must access the resource it protects. You can control this expense to some extent by economizing on the length of time a thread spends in a critical section of code. (It's how long this code takes to complete that determines how long other threads must wait on a mutex.) We'll revisit performance issues in Chapter 6, Practical Considerations.
The simplest way to synchronize our program would be have a single mutex protect all types of access to the entire list—insertion, deletion, reading data, and writing data. This approach would eliminate all race conditions for these operations and prevent deadlock, thus meeting our requirements. We'll need to first modify the llist_t data structure in our header file, as follows:
typedef  struct llist {
    llist_node_t *first;
    pthread_mutex_t mutex;
    } llist_t;
We'll then change our initialization and insert routines as shown in Example 3-5.
Example 3-5: Multithreaded Linked List Code (llist_threads.c, llist_threads.h)
int llist_init(llist_t *llistp)
  llistp->first = NULL;
  pthread_mutex_init(&(llistp->mutex), NULL);
  return 0;
int llist_insert_data(int index, void *datap, llist_t *llistp)
   llist_node_t *cur, *prev, *new;
   int found = FALSE;
   for (cur = prev = llistp->first; cur != NULL; prev = cur, cur= cur->nextp) {
          if (cur->index == index) {
                    cur->datap = datap;
                    found = TRUE;
          } else if (cur->index > index) {
   if (!found) {
          new = (llist_node_t *)malloc(sizeof(llist_node_t));
          new->index = index;
          new->data = datap;
          new->nextp = cur;
          if (cur == llistp->first)
                    llistp->first = new;
          else prev->nextp = new;
   return 0;
The llist_t structure now includes a mutex lock that protects the entire list. The llist_init routine initializes the mutex in the pthread_mutex_init call that follows the malloc that allocates the list. Each routine that accesses the list, like llist_insert_data in Example 3-5, must first obtain this mutex (by calling pthread_mutex_lock).It releases the mutex (by calling pthread_mutex_unlock)just before exiting.
By putting the synchronization inside our module, we've created a thread safe data type without changing its interface.
Although this solution does meet our design requirements, it may not provide the best possible performance. The mutex controls access to the entire list. That is, while the list is being accessed by any thread, it is unavailable to all other threads. If concurrent accesses to the list are uncommon, this may be fine; but what if this isn't true?
Access Patterns and Granularity
Your choices for optimizing performance in a multithreaded program are tied to how its threads access the shared data on the list. If almost all accesses are reads and writes of existing data nodes, as opposed to insertions and removes, your most efficient approach might be to allow nodes to be individually locked. This would allow threads to read and write different nodes simultaneously. However, if threads often insert and remove nodes to and from the list, this solution would add another layer of complexity.
This basic design decision concerns lock granularity—that is, the level at which we apply locks to our shared data (and, thus, the number of locks we use to protect the data). On one hand, we could use coarse-grain locking and use a single mutex to control access to all shared data. This is what we've been using up to this point. On the other hand, we could use fine-grain locking and place a lock on every individually accessible piece of data. Fine-grain locking may result in improved concurrency, but it requires the most coding and overhead in synchronization calls.
In practice, locking systems adopt a lock granularity design that falls somewhere in between these extremes. The programmer takes anticipated usage into account—for instance, whether it's likely for more than one thread to request withdrawals on the same account at the same time. It's an art to provide the most efficient implementation while ensuring that the application works correctly.
Locking Hierarchies
If your shared data has some hierarchical structure—for instance, it's a tree with some depth to it—you may want to allow threads to lock separate subtrees of the structure simultaneously. This assumes a finer-grain lock design.
Figure 3-2 shows a tree with a root and three levels—L1, L2, and L3. We've assigned a mutex to each level to control access to the sublevels below.
Figure 3-2: Locking in hierarchical data structures
This is all well and good, but beware! If we now allow threads to acquire these locks in any order they please, the kind of deadlock known as a deadly embrace can occur. For example, consider two threads that intend to lock the same section of the tree (see Figure 3-3).The first thread tries to obtain the L1, L2, and L3 locks in succession, and the second thread goes after L3, L2, and L1 in the reverse order at the same time. If their execution overlapped, it's quite possible that each would stall waiting for a lock already held by the other. (Here, the first thread blocks waiting for the L3 lock that the second thread holds, and the second thread blocks waiting for the L2 lock that the first thread holds.) Our threads are deadlocked waiting for each other.
Figure 3-3: Deadly embrace in a locking hierarchy
To avoid a deadly embrace such as this, we must enforce a fixed locking hierarchy. To access data at any given level, all threads in our program must obtain the lock at each lower level in exactly the same order. If threads always took L1 before L2, and L2 before L3, any thread obtainingL1 can assume that L2 is either unlocked or locked by another thread. It can also assume that the other thread that currently owns L2 will not try to lock L1. Presumably, the second thread will release L2 sometime, giving the first thread an opportunity to proceed through the hierarchy. Thus, our threads avoid deadlock. Note that this scheme allows a thread with a lock in the hierarchy to release locks of lower levels so that other threads can pursue data off other branches of the subtree.
Some locking systems have built-in support for locking hierarchies—Pthreads isn't one of them. In these systems, you can define each lock's place in the hierarchy to the locking system. When you subsequently try to obtain a lock, the system checks if you own all required prior locks. If not, it gives you an error.
In systems without this support, locking hierarchies must exist entirely in the programmer's head or, perhaps, be written profusely into program comments.
Sharing a Mutex Among Processes
A mutex has a single attribute that determines whether or not it can be seen by threads in other processes: process-shared. (A mutex object also has two attributes that assist in scheduling threads. We'll discuss them in Chapter 4.) If your platform allows you to set the process-shared attribute, the compile-time constant _POSIX_THREAD_PROCESS_SHARED will be TRUE.
When you initialize a mutex dynamically (that is, by calling pthread_mutex_init),the Pthreads library creates a mutex attribute object for it. A Pthreads mutex attribute object is of type pthread_mutex_attr_t.You initialize and deinitialize it by calling pthread_mutexattr_initand pthread_mutexattr_destroy, respectively. To set the process-shared attribute, supply the PTHREAD_PROCESS_SHARED constant in a pthread_mutexattr_setshared call. To revert to a process-private mutex, specify the PTHREAD_PROCESS_PRIVATE constant. Processes that share a mutex must be able to access it in shared memory (created through System V shared memory mechanisms or through mmap calls). The mutex is initialized once by a thread in any of the processes that plan to use it. Example 3-6 shows one way of initializing a process-shared mutex.
Example 3-6: A Process-Shared Mutex (process_shared_mutex.c)
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/wait.h>
#error "This platform does not support process shared mutex"
int   shared_mem_id;
int   *shared_mem_ptr;
pthread_mutex_t *mptr;
pthread_mutex_attr_t mutex_shared_attr;
extern int
  pid_t  child_pid;
  int  status;
  /* initialize shared memory segment */
  shared_mem_id = shmget(IPC_PRIVATE, 1*sizeof(pthread_mutex_t), 0660);
  shared_mem_ptr = (int *)shmat(shared_mem_id, (void *)0, 0);
  mptr = shared_mem_ptr;
  pthread_mutexattr_setshared(&mutex_shared_attr, PTHREAD_PROCESS_SHARED);
  pthread_mutex_init(mptr, &mutex_shared_attr);
  if ((child_pid = fork()) == 0) {
  /* child */
           /* create more threads */
  } else {
  /* parent */
           /* create more threads */
In Example 3-6, we allocate storage for the mutex from the shared memory segment. The main thread in the parent process initializes it, using a mutex attribute object we've set to the PTHREAD_PROCESS_SHARED constant in a call to the function pthread_mutexattr_setshared.After it initializes the mutex, the parent process forks. Subsequently, the main threads of both the parent and child processes can create more threads, all of which can use the mutex to synchronize access to mutually shared data. When using a process-shared mutex, consider the following:
 Once a process has multiple threads, forking has many pitfalls. (If youwish to steer this course, see Chapter 5, before going any further.) In Example 3-6, we took care to initialize the mutex before forking and to fork before we created multiple threads from multiple processes.
 For strict, process-to-process synchronization, use System V or POSIX.1bsemaphores. For thread-to-thread synchronization across processes, youcan use POSIX.1b semaphores as an alternative to process-shared mutexes.

Previous SectionNext Section, Inc © 2000 –  Feedback