Concurrency

Computer Science

Clean Code

Concurrency

Why Concurrency?

Concurrency is a decoupling strategy. It helps us decouple what gets done from when it gets done.

Myths and Misconceptions

Consider these common myths and misconceptions:

Concurrency always improves performance: only when there is a lot of wait time that can be shared between multiple threads or multiple processors.
Design does not change when writing concurrent program: the design of a concurrent algorithm can be remarkably different from the design of a single-threaded system.
Understanding concurrency issues is not important when working with a container

Here are a few more balanced sound bites regarding writing concurrent software:

Concurrency incurs some overhead, both in performance as well as writing additional code.
Correct concurrency is complex, even for simple problems.
Concurrency bugs aren’t usually repeatable, so they are often ignored as one-offs instead of the true defects they are.
Concurrency often requires a fundamental change in design strategy.

Challenges

What makes concurrent programming so difficult? Consider the following trivial class:

public class X {
  private int lastIdUsed;

  public int getNextId() {
    return ++lastIdUsed;
  }

Let’s say we create an instance of X, set the lastIdUsed field to 42, and then share the instance between two threads. Now suppose that both of those threads call the method getNextId(). Then one possible outcome is that thread one gets the value 43, thread two gets the value 43 and lastIdUsed is 43.

This surprising result occurs when the two threads step on each other. This happens because there are many possible paths that the two threads can take through that one line of Java code, and some of those paths generate incorrect results.

Concurrency Defense Principles

What follows is a series of principles and techniques for defending your systems from the problems of concurrent code.

Single Responsibility Principle

Concurrency design is complex enough to be a reason to change in it’s own right and therefore deserves to be separated from the rest of the code. Here are a few things to consider:

Concurrency-related code has its own life cycle of development.
Concurrency-related code has its own challenges

Recommendation: Keep your concurrency-related code separate from other code.

Limit the Scope of the Data

As we saw, two threads modifying the same field of a shared object can interfere with each other, causing unexpected behavior. It is important to restrict the number of such critical sections.

Recommendation: Take data encapsulation to heart; severely limit the access of any data that may be shared.

Use Copies of Data

A good way to avoid shared data is to avoid sharing the data in the first place. If there is an easy way to avoid sharing objects, the resulting code will be far less likely to cause problems. You might be concerned about the cost of all the extra object creation. It is worth experimenting to find out if this is in fact a problem. However, if using copies of objects allows the code to avoid synchronizing, the savings in avoiding the intrinsic lock will likely make up for the additional creation and garbage collection overhead.

Threadds Should Be as Independent as Possible

Consider writing your threaded code such that each thread exists in its own world, sharing no data with any other thread.

Recommendation: Attempt to partition data into independent subsets than can be operated on by independent threads, possibly in different processors.

Know Your Execution Models

We need to understand some basic definitions.

Bound resources: Resources of a fixed size or number used in a concurrent environment.
Mutual exclusion: Only one thread can access shared data or a shared resource at a time.
Starvation: One thread or a group of threads is prohibited from proceeding for an excessively long time or forever.
Deadlock: Two or more threads waiting for each other to finish. Each thread has a resource that the other thread requires and neither can finish until it gets the other resource.
Livelock: Threads in lockstep, each trying to do work but finding another “in the way.”

Threads in lockstep, each trying to do work but finding another “in the way.”

Producer-Consumer

One or more producer threads create some work and place it in a buffer or queue. One or more consumer threads acquire that work from the queue and complete it. The queue between the producers and consumers is a bound resource.

Readers-Writers

Coordinating readers so they do not read something a writer is updating and vice versa is a tough balancing act. Writers tend to block many readers for a long period of time, thus causing throughput issues.

A simple strategy makes writers wait until there are no readers before allowing the writer to perform an update. If there are continuous readers, however, the writers will be starved. On the other hand, if there are frequent writers and they are given priority, throughput will suffer. Finding that balance and avoiding concurrent update issues is what the problem addresses.

Dining Philosofers

Imagine a number of philosophers sitting around a circular table. A fork is placed to the left of each philosopher. There is a big bowl of spaghetti in the center of the table. The philosophers spend their time thinking unless they get hungry. Once hungry, they pick up the forks on either side of them and eat. A philosopher cannot eat unless he is holding two forks.

Replace philosophers with threads and forks with resources and this problem is similar to many enterprise applications in which processes compete for resources. Unless carefully designed, systems that compete in this way can experience deadlock, livelock, throughput, and efficiency degradation.

Beware Dependencies Between Synchronized Methods

There will be times when you must use more than one method on a shared object. When this is the case, there are three ways to make the code correct:

Client-Based Locking—Have the client lock the server before calling the first method and make sure the lock’s extent includes code calling the last method.
Server-Based Locking—Within the server create a method that locks the server, calls all the methods, and then unlocks. Have the client call the new method.
Adapted Server—create an intermediary that performs the locking.

Keep Synchronized Sections Small

Locks are expensive because they create delays and add overhead. So we want to design our code with as few critical sections as possible.

Recommendation: Keep your synchronized sections as small as possible.

Writing Correct Shut-Down Code Is Hard

Writing a system that is meant to stay live and run forever is different from writing something that works for awhile and then shuts down gracefully. Graceful shutdown can be hard to get correct. Common problems involve deadlock, with threads waiting for a signal to continue that never comes.

Recommendation: Think about shut-down early and get it working early.

Testing Threaded Code

Recommendation: Write tests that have the potential to expose problems and then run them frequently, with different programatic configurations and system configurations and load. If tests ever fail, track down the failure. Don’t ignore a failure just because the tests pass on a subsequent run.

Here are a few more fine-grained recommendations:

Treat spurious failures as candidate threading issues.
Get your nonthreaded code working first
Make your threaded code pluggable.
Make your threaded code tunable.
Run with more threads than processors.
Run on different platforms.
Instrument your code to try and force failures.

Make Your Threaded Code Pluggable

Write the concurrency-supporting code such that it can be run in several configurations:

One thread, several threads, varied as it executes
Threaded code interacts with something that can be both real or a test double.
Execute with test doubles that run quickly, slowly, variable.
Configure tests so they can run for a number of iterations.

Run with More Threads Than Processors

Things happen when the system switches between tasks. To encourage task swapping, run with more threads than processors or cores.

Run on Different Platforms

Different operating systems have different threading policies, each of which impacts the code’s execution. Multithreaded code behaves differently in different environments. You should run your tests in every potential deployment environment.

Instrument Your Code to Try and Force Failures

It is normal for flaws in concurrent code to hide. Simple tests often don’t expose them. Indeed, they often hide during normal processing. How might you increase your chances of catching such rare occurrences? You can instrument your code and force it to run in different orderings.

There are two options for code instrumentation:

Hand-coded
Automated

Hand-coded

You can insert calls to wait(), sleep(), yield(), and priority() in your code by hand. Here is an example of doing just that:

public synchronized String nextUrlOrNull() {
  if(hasNext()) {
    String url = urlGenerator.next();
    Thread.yield(); // inserted for testing.
    updateHasNext();
    return url;
  }
  return null;
}

The inserted call to yield() will change the execution pathways taken by the code and possibly cause the code to fail where it did not fail before. There are many problems with this approach:

You have to manually find appropriate places to do this.
How do you know where to put the call and what kind of call to use?
Leaving such code in a production environment unnecessarily slows the code down.
You may or may not find flaws.

What we need is a way to do this during testing but not in production. We also need to easily mix up configurations.

Automated

You could use tools like an Aspect-Oriented Framework to programmatically instrument your code. For example, you could use a class with a single method:

public class ThreadJigglePoint {
  public static void jiggle() {
  }
}

You can add calls to this in various places within your code:

public synchronized String nextUrlOrNull() {
  if(hasNext()) {
    ThreadJiglePoint.jiggle();
    String url = urlGenerator.next();
    ThreadJiglePoint.jiggle();
    updateHasNext();
    ThreadJiglePoint.jiggle();
    return url;
  }
  return null;
}

Now you use a simple aspect that randomly selects among doing nothing, sleeping, or yielding. Or imagine that the ThreadJigglePoint class has two implementations. The first implements jiggle to do nothing and is used in production. The second generates a random number to choose between sleeping, yielding, or just falling through. If you run your tests a thousand times with random jiggling, you may root out some flaws.

Emergence Smells and Heuristics