Share Button

Yes, I will talk about enums again, I find these guys fascinating. And I’m one of these people who loves to know what’s happening behind the scene, to scratch the surface and (reasonably) go as deep as I can.

So you’ve heard about HashMap or HashSet right ? It turns out they have brothers (or sisters, up to you). They are called EnumMap and EnumSet, and they also extend AbstractMap and AbstractSet. When I discovered this, I obviously asked myself, why did they do this, how does it work ?

As you can imagine they are dedicated to be used with enums only, the enum being the key. So let’s go deep into the OpenJDK implementation. I’m not going to talk about the implementation of the HashMap (or HashSet), they are plenty of posts about it on the internet. I’ll start directly with the explanation of EnumMap and EnumSet because I never found any implementation’s explanation.

During the whole post, we will work with this enum as an example:

enum Direction {
  NORTH, SOUTH, EAST, WEST
}

And let’s say it right now, these implementations are all about optimization, and it will refresh your boolean algebra knowledge.

EnumMap, say bye bye to the buckets

We can describe an EnumMap as a HashMap, but with a constraint: the key has to be an enum. The compiler is enforcing that constraint by using generics in the declaration of the EnumMap:

public class EnumMap<K extends Enum<K>, V>
Construction

The first surprising thing when you start to use an EnumMap is that, contrary to HashMap, it doesn’t have a default constructor. If you want to create an EnumMap you have to give to the constructor the java.lang.Class of the enum used as a key:

Map<Direction,String> myEnumMap = new EnumMap<Direction,String>(Direction.class);

Why would it need the java.lang.Class of the enum ?
Keep in mind that the number of instances of the enum is fixed, you can’t create new enum values on the fly. That being known, that means an EnumMap can have maximum n entries where n is the number of instances of the enum. This is why it needs the java.lang.Class as a constructor’s parameter, the EnumMap will be able to initialize its internal size by using the class to know the  number of enum instances. No need of default initial capacity just like HashMap, an EnumMap will be perfectly sized immediately.

Key/Value association

With the information above, the EnumMap will instantiate an array with the size of the number of enum. This array will contain the values, but how is the association between the value and the key made ?

The big difference between a HashMap and an EnumMap is the probability of collision, it’s simply impossible with the EnumMap. Where a HashMap will use the hashCode of the key to determine the bucket, EnumMap will simply use the ordinal value of the key as the index in the array containing the values. It makes things way more simpler because the ordinal value is unique (in the scope of the current enum of course):

public V get(Object key) {
  return values[((Enum)key).ordinal()];
}

public V put(K key, V value) {
  int index = key.ordinal();
  Object oldValue = values[index];
  values[index] = value;
  return oldValue;
}

The code above has been simplified but that’s almost exactly what is happening when you try to get/put a value in a EnumMap. Way more simpler and efficient than an HashMap. I told you it was all about optimization, a HashMap with an enum as the key would work, but it wouldn’t be as efficient.

A little bit more about the optimization

Talking about optimization, you remember I said the constructor needs the java.lang.Class to retrieve the number of enums for its internal initialization right ? But I didn’t say how it’s fetching the number of enum !

The java.lang.Class object offers the method getEnumConstants() which returns an array of the enum instances, if the class used is not an enum, this method returns null. It sounds like exactly what we need ! But surprisingly this is not the method used by EnumMap to get the array. Instead it’s using the mystical  sun.misc.SharedSecrets and its member JavaLangAccess.

JavaLangAccess offers a method getEnumConstantsShared(java.lang.Class<E> eClass), this method returns an array of the enum instances for the class given as a parameters. So why using this one instead of calling directly getEnumConstants() on the java.lang.Class ? Because getEnumConstants() creates for every call a new array, contrary to getEnumConstantsShared() which returns every time the same cached array. It makes a big difference in terms of performance. 

Actually, there is a method in java.lang.Class which returns a cached array of the enum’s instances (and that’s the one used by JavaLangAccess), but EnumMap can’t use this method directly because it is only available in the java.lang package (and EnumMap is in java.util). This is why EnumMap has to go around this restriction by using the JavaLangAccess implementation. If you’re curious, you can have a look at the implementation of JavaLangAccess in java.lang.System.

This is how the EnumMap works, simple and very efficient because the size is fixed and each key is guaranteed collisionless, but it becomes a little bit more sophisticated for the EnumSet.

EnumSet, it’s all about bit shifting

Just like EnumMap, EnumSet is the equivalent of HashSet but values in the Set can only be an enum instances:

public abstract class EnumSet<E extends Enum<E>>

But, wait a minute, can you figure it out the difference in the declaration ? EnumSet is an abstract class ! That means there is at least one underlying implementation. Actually there are two classes extending EnumSetRegularEnumSet and JumboEnumSet.

RegularEnumSet will be used if the number of enum instances is under or equals to 64, if it goes above, the JumboEnumSet implementation will be used. But these classes are not public, they are only available in the java.util package. So this is not up to you to decide which implementation you want to use.

Construction

EnumSet offers a static public method noneOf to create an empty Set. Just like EnumMap, the method expects the java.lang.Class of the enum. The same system is used to retrieve the number of enum instances (getEnumConstantsShared through JavaLangAccess). Depending the size of the array, RegularEnumSet or JumboEnumSet will be used:

public static <E extends Enum<E>> EnumSet<E> noneOf(Class<E> elementType) {
  Enum[] instances = SharedSecrets.getJavaLangAccess().getEnumConstantsShared(elementType)
  if (instances.length <= 64)
    return new RegularEnumSet<>(elementType, instances);
  else
    return new JumboEnumSet<>(elementType, instances);
  }
}

Why 64 as the limit ? Because 64 is the number of bit composing the long data type. Do you see where this is heading ? With a long variable we can cover the representation of the presence of an enum in the EnumSet with 1 for present and 0 for not present. This is what the RegularEnumSet implementation is doing.

RegularEnumSet

So, the RegularEnumSet uses a long data type member to represent  the content of the Set. Each bit of the long can be associated to a number, from 0 to 63.  By default the value is initialized to 0, which means the Set is empty:

elementBinaryAdd a value

Let’s say we want to add the enum Direction.EAST to the EnumSet. The ordinal value is 2. The goal is to set the bit number 2 with the value 1. To reach this state, the add(E e) method will perform multiple binary operations:

First it’s using a new long variable and set the bit, with ordinal value as the index, by operating a signed left shift:

leftShift

Now, we have a long variable which represents the presence of the enum in the Set. The last operation consists in combining this long with the existing values. To merge the long setBitWithOrdinal and the private long elements, a simple bitwise inclusive OR operation is executed between the two value:

After the binary OR operation, the value of elements contains the new enum and still the previous values of other enums too. In the example above, we can see the enum with ordinal 0 was present in the EnumSet and we added the enum with ordinal 2. Now both enums are represented.

Contains a value

To determine if an enum is present in the EnumSet, the implementation has to check if the bit n of elements is set to 1. Where n is the ordinal value of the enum of course. The operation is very close to the one adding a value. First, we need to represent the ordinal value in a long variable, let’s take Direction.EAST as the example again (ordinal == 2) and apply the signed left shift:

leftShift

Now that we have a long variable representing the enum, a binary AND operation will be applied between setbitWithOrdinal and the private long elements of the EnumSet. The result of this operation will have all the other ordinal values from element to 0 (because the other enums are not represented in setbitWithOrdinal) and if the concerned enum is present in elements, the binary AND will keep the value 1, otherwise it will be a 0, just like the other values. That means if the enum is not present in the EnumSet, all the bits of the long variable computed by this operation will be set to 0, so the result will be 0. A result different from 0 will signify the enum is present is in the EnumSet.

andOperation

Pretty fancy system right ? Very efficient. I didn’t cover other methods, like remove(Object e) or containsAll(Collection c), because it’s the same spirit at the previous operations, if you understood what happened above you got the essential. Now let’s see when the number of enum instances is bigger than 64.

JumboEnumSet

So the size is bigger than 64, it doesn’t fit anymore in a long.  Which data structure is it using ? An array of long ! For example, if the number of enum instances is 100, we only need an array of long sized with 2 (64 bits + 64 bits > 100), it covers the need. It’s still using the bit representation with multiple long variables.

The way to calculate the array’s size is a little bit fancy, a classic way would to divide the number of instances by 64 and round up to the next integer.

Instead, it’s using the right shift arithmetic to make the division by 64 and use the integer part of the result as the size of the array:

elements = new long[(universe.length + 63) >>> 6];

When you want to divide a number by a power of 2 (64 is 26), you just need to do a binary right shift with the exponent. It’s a fancy and very efficient way to make a division.

Add a value

It works exactly the same way as the RegularEnumSet, except that before applying the OR binary operation, it’s retrieving the long in the array concerned by the enum’s ordinal. To calculate the index, a simple division by 64 on the enum’s ordinal (still using the right shifting technique) :

public boolean add(E e) {
  int eOrdinal = e.ordinal();
  int index = eOrdinal >>> 6; // Calculate the index of the array
  long oldElements = elements[index];
  long setBitWithOrdinal = 1L << eOrdinal; // Represents the ordinal in a long
  elements[index] = elements[index] | setBitWithOrdinal; // Merge the current elements with the new ordinal
  boolean result = (elements[index] != oldElements);
  return result;
}
Contains a value

Again, the binary operation itself to check if the enum is represented in the Set doesn’t change, compare to RegularEnumSet. It’s all about retrieving the long in the array depending of the ordinal.

public boolean contains(Object e) {
  int eOrdinal = ((Enum)e).ordinal();
  int index = eOrdinal >>> 6; // Calculate the index of the array
  long setBitWithOrdinal = 1L << eOrdinal; // Represents the ordinal in a long
  long comparisonResult = elements[index] & setBitWithOrdinal; // Check if ordinal is present in the array
  return comparisonResult != 0;
}

It’s really all about optimization with binary operations.
Does it really make a big difference between a HashSet and EnumSet ? Definitely YES !

Little benchmark

It wrote this very quickly, it’s a Set containing all the values of an enum, I’m looping 10 000 000 times to check if the Set contains values:

enum Direction {
  NORTH, SOUTH, EAST, WEST;
}

public static void main(String[] args) {
  long startTime = System.currentTimeMillis();

  Set<Direction> directionsSet = new HashSet<Direction>();
  directionsSet.addAll(Arrays.asList(Direction.values()));
  for (int i = 0; i < 10000000; i++) {
    directionsSet.contains(Direction.NORTH);
    directionsSet.contains(Direction.SOUTH);
    directionsSet.contains(Direction.EAST);
    directionsSet.contains(Direction.WEST);
  }

  long stopTime = System.currentTimeMillis();
  long elapsedTime = stopTime - startTime;

  System.out.println(elapsedTime);
}

On my Macbook Air, the result is 225ms for the HashSet implementation.
If we replace the implementation with an EnumSet:

Set directionsSet = EnumSet.noneOf(Direction.class);

The same loop runs in only 18ms. Good job Joshua !

Share Button