Collections.unmodifiable revisited


Writing correct concurrent programs is notoriously hard. Concurrency bugs are extremely hard to find and might rarely manifest (and if they occur, they are frequently attributed to cosmic rays). Talking Java: just because a multi-threaded program works on your machine in your VM does not mean that it won’t come crashing down in flames (luckily, aircraft software follows a tough certification process) on another VM or another machine with a different set of cores. One thing that makes multi-threaded programming especially hard is caching. In Java, threads may cache non-volatile data and if updates on this data are not synchronized, these updates may never become visible to other threads. This typically results in non-termination of a program when the last remaining thread just does not see that another thread set its boolean stop to true.

All in all, a lot of concurrency problems have their roots in shared access to mutable data. So, an elegant solution for getting rid of these problems is to not to do that in the first place. Instead, share immutable data. In Java, the Collection classes provide ways of storing, well, collections of similar data and there are implementations for the most common data structures. So, sharing only immutable data often boils down to sharing immutable versions of the collections used internally. The utility class Collections provides methods for working with and transforming Collection objects and some of these methods of the form Collection unmodifiable = Collections.unmodifiableCollection(Collection c) provide unmodifiable access to original (maybe internal) collections. However, things are still not that easy and there is a subtle problem in using these methods for sharing data. To make this more clear, consider the following code:

Set origin = new HashSet();
origin.add("one");
origin.add("two");
Set unmodifiable = Collections.unmodifiableSet(origin);
someMultiThreadedContext.process(unmodifiable); // async processing, returns immediately
origin.add("three");

You could expect that someMultiThreadContext would get the set containing two elements and then do whatever with it. In the meantime, you would be able to just go on updating the internal state (aka origin) as you like without affecting someMultiThreadContext. It really pays off to read the Javadocs more thoroughly sometimes. What Collections.unmodifiableSet() provides is an unmodifiable wrapper and not an unmodifiable copy. If someMultiThreadContext would attempt to alter the set, it would receive an UnsupportedOperationException. This does NOT mean that the contents of unmodifiable are actually unmodifiable. It is just a wrapper collection. In its core, it still holds a reference to origin. This means that if origin is updated, someMultiThreadContext might still see this update.
To make the thing entirely clear, consider this main method:

	public static void main(String[] args) {
		Set origin = new HashSet();
		origin.add("one");
		origin.add("two");
		Set unmodifiable = Collections.unmodifiableSet(origin);
		System.out.println("Size: " + unmodifiable.size());
		origin.add("three");
		System.out.println("Size: " + unmodifiable.size());
	}

which prints:

Size: 2
Size: 3

And here the visibility problems begin: You cannot use Collections.unmodifiableCollection() for implementing a view of a thread-unsafe collection in a multi-threaded context, because of the visibility guarantees. After all, it just produces a wrapper where state-changing methods are overwritten to throw exceptions. There is no synchronization or volatile involved. Because of this, threads may locally cache the content of the sets and never see an update of origin. This means that Collections.unmodifiableCollection() cannot be used to propagate state to other concurrent parts of the program in this manner. You need synchronization on behalf of origin, either by synchronizing it yourself or by using a thread-safe implementation of Set. The fact that immutability does not necessarily provide visibility is just too easily forgotten (“No problem, my collection is immutable, nothing can go wrong!“).

Please note that Collection.unmodifiableCollection() still is a way for sharing immutable data, as long as the data is fire and forget and not a view of internal state where updates have to be seen. For instance, if the data is some message sent to another subsystem and then forgotten or never touched again by the sender.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s