Locking and synchronization techniques need to be used in a way that reduces their impact on performance.

This could involve moving contents around to include or remove code from critical sections.

We can use manual scoping as well as techniques to tell the compiler when it is time for a value to die to control when a Mutex needs to go out of scope and release.

Locking Granularity

Lock granularity describes how much data is locked by a given lock. Coarse-grained locking is easier to implement but can reduce opportunities for parallelism. Fine-grained locking is the opposite.

Lock Overhead

To use a lock, you pay:

  • Allocated memory for locks
  • Initialization and destruction time
  • Acquisition and release time

Locking Contention

Locking time is typically wasted waiting for a lock to become available. We can fix this by:

  • Making the locking regions smaller
  • Making more locks for independent sections

Deadlocks

This is when you wait for a lock held by process x while holding a lock held by process X’. The formal conditions are:

  • Mutual exclusion: A resource can belong to at most, 1 process at a time
  • Hold-and-wait: A process that is currently holding some resources may request additional resources and may be forced to wait for them.
  • No Preemption: A resource cannot be taken from a process that currently holds it. The process must release that resource
  • Circular-wait: A cycle in the resource allocation graph

To avoid these, be careful if your code acquires a lock while holding one.

Course Grained Locking

Coarse grained locking uses few locks which will prevent opportunity for deadlocking. However, your program will quickly become sequential here. A large example is the Python GIL.

Fine Grained Locking

This maximizes parallelization but introduces complexity. You waste memory and overhead time if your program isn’t very parallel. You are now prone to errors and deadlocks.

Reentrancy

Writing reentrant code is code that is independent in each run and is thread-safe. See Pure Functions and Functional Programming

Lock Convoys

This happens when you have 2+ threads that contend for a lock. This is kind of like a traffic jam. This is called a convoy because the threads are moving in a tight group. You can think about what is happening here as a train who pulls containers behind it where when the train moves a little, sequentially each car moves a bit as slack between the containers allow until there is no slack. In this case, The problem resembles this motion because each thread takes a small step forward before it stops and some other container then gets a turn during which it also moves forward a tiny bit before stopping. The same thing is happening to the threads and we spend all the CPU time on context switches rather than executing the actual code.

Consequences

Threads acquire the lock frequently but not for too long. Unrelated threads of the same priority get to run for an unusually long amount of time.

Mitigation Techniques

Unfair Locks

  • Unfair locks do not guarantee first-come-first-served access to a resource. Any waiting thread may acquire it which helps prevent lock convoys. Sleeping
  • Make threads not in the convoy call sleep regularly to give other threads a chance to run.
    Sharing
  • Use a reader-writer lock to allow more concurrency then we would get if we used exclusive locking. If there are a lot of writes, the benefit is capped. Caching
  • Shrink the critical section to just pull a copy of shared data and operate on that copy, it reduces the amount of time the lock is held for.
    Trylock
  • Try to acquire the lock, if you fail, yield the CPU to another thread. Basically we do a try release cycle for a couple of set retries, and after retries, we just end up waiting.

Thundering Herd Problem

This is when some condition is fulfilled and it triggers a large number of threads to wake up. It is likely they can’t all proceed, so some will get blocked and then awoken again all at once in the future. In this case it would be better to wake up one thread at a time instead of all of them.

Lost Wakeup Problem

Waking up only one thread at a time has its own problems 1. For instance, on a condition variable you can choose to wake up one waiting thread with either notify_one() or all waiting threads with notify_all(). If you use notify_one(), then you can encounter the lost wakeup problem.

We typically just notify_all, in all situations…

If we could get rid of locks and waiting all together, we can avoid a bunch of problems (lock convoys, deadlocks, starvations). See Atomicity.

Related