Karoo Project Utilities

Threadsafe Utilities

Introduction to Threadsafe Utilities

Computers with multiple CPUs only really became commonplace fairly recently. When operating systems merely use time slicing to simulate true multi-threading, then there is not much risk of race conditions. But when there are more than one processor, it is more likely that more than one thread may be accessing the same memory at once. Due to the fact that the risk has only recently become a reality, some top quality libraries are not actually thread safe.

Even the STL and IOS and Strings libraries are not thread safe.

One way to overcome this problem is to use semaphore locking around every call to any of these libraries, but that is inefficient, and too much locking can cause deadlocks.

There were other motivations for making a new string class, and a hash-map class:

Motivation for making a threadsafe string class

The most compelling motivation for the new string class was that it should be thread safe, but it also has some new features:

E.g. the old string class had a c_str() method that returned a const char* into the actual memory used. This is, by design, not thread safe, because that memory can be modified by another thread, and there is no way for the string class itself to lock out other threads from modifyiong that memory, because it will be out of the control of the string class itself. Similarly, the iterator into the string exposes it to race conditions. Anything that is effectively returning a pointer, or a reference to the data, exposes a class to race conditions, because the access to that data then falls out of the control of the class itself.

The new string class is called "text".

Motivation for making a threadsafe hash-map class

A hash-map lends itself to being efficiently threadsafe, while a tree (binary or red-black) does not. The hash-map can fairly simply be implemented with the ability to be simultaneously readable (at least in part) while being updated. This is because the basic structure does ot change often (as compared to a tree which is constantly being balanced. So it is quite practical to lock just the one element that is being updated... except of course when it needs re-hashing... in which the whole structure will be locked. The implementation of the locks are that they allow multiple simultaneous reads, but the write will block until all current reads have completed, and while it is waiting, all new reads will block.

The hash-map in The Karoo Project uses an array of buckets, each corresponding to one possible hashed key, and each containing a variable length array of entries. The hashing algorithm is built in to the text class. It uses the sdbm algorithm, which creates an evenly distributed hash.

When a new element is added to the hashmap, first the load-factor is checked. If the hash-map is too loaded, then it needs to be re-hashed. In this case, a write-lock is obtained for the bucket array. Then the buckets and their chained elements are iterated over and a new (larger) bucket array is created. Then the bucket array's write-lock is released. Once the (infrequent) rehashing is concluded, then the element's hash is calculated from the key, and a read-lock is obtained for the buckets. Then the bucket is located, and that backet is locked for write. The element is inserted in the chain, and the locks are released.

So you can see how that only part of the hash-map is locked at any one time, and the whole hash-map is only locked out for read when it's being (relatively infrequently) re-hashed. If the hashmap is read more than it is written, then the read-locks will allow many threads to read at once. Its only when something is being inserted or erased, that one particular bucket will be locked for write, but at the same time, all the other buckets can be read from... or written to for that matter.

However, having said all this, the map and multimap in the STL are excellent, though not thread-safe. There is no simple way to make them partially locked (as with a hashmap). So the best way to use them (if you need a red/black tree, as I did, in the queue class), is to use a semaphore lock around every call to it.

The hashmap's iterator does not return elements in order. The order is not quite random, but may as well be for all intents and purposes. So if you need an ordered iterator, you need a bin-tree (or even better, a red/black tree) such as the map and multimap templates in STL.


Generated on Tue Feb 16 15:04:29 2010 for Karoo by  doxygen 1.5.8