[KEEP IN SYNC WITH LOCKFREE.H] overview -------- this module provides several implicitly thread-safe data structures. rather than allowing only one thread to access them at a time, their operations are carefully implemented such that they take effect in one atomic step. data consistency problems are thus avoided. this novel approach to synchronization has several advantages: * deadlocks are impossible; * overhead due to OS kernel entry is avoided; * graceful scaling to multiple processors is ensured. mechanism --------- the basic primitive that makes this possible is "compare and swap", a CPU instruction that performs both steps atomically. it compares a machine word against the expected value; if equal, the new value is written and an indication returned. otherwise, another thread must have been writing to the same location; the operation is typically retried. this instruction is available on all modern architectures; in some cases, emulation in terms of an alternate primitive (LL/SC) is necessary. memory management ----------------- one major remaining problem is how to free no longer needed nodes in the data structure. in general, we want to reclaim their memory for arbitrary use; this isn't safe as long as other threads are still accessing them. the RCU algorithm recognizes that all CPUs having entered a quiescent state means that no threads are still referencing data. lacking such kernel support, we use a similar mechanism - "hazard pointers" are set before accessing data; only if none are pointing to a node can it be freed. until then, they are stored in a per-thread 'waiting list'. this approach has several advantages over previous algorithms (typically involving reference count): the CAS primitive need only operate on single machine words, and space/time overhead is much reduced. usage notes ----------- useful "payload" in the data structures is allocated when inserting each item: additional_bytes are appended. rationale: see struct Node definition. since lock-free algorithms are subtle and easy to get wrong, an extensive self-test is included; #define PERFORM_SELF_TEST 1 to activate. terminology ----------- ; "atomic" : means indivisible; in this case, other CPUs cannot interfere with such an operation. ; "race conditions" : are potential data consistency problems resulting from lack of thread synchronization. ; "deadlock" : is a state where several threads are waiting on one another and no progress is possible. ; "thread-safety" : is understood to mean the preceding two problems do not occur. ; "scalability" : is a measure of how efficient synchronization is; overhead should not increase significantly with more processors. ; "linearization point" : denotes the time at which an external observer believes a lock-free operation to have taken effect.