First Thoughts On Java’s Valhalla
After watching Brian Goetz’s Presentation on Valhalla, I started thinking more seriously about how value classes work. There are a few things that are exciting, but a few that are pretty concerning too. Below are my thoughts; please reach out if I missed something!
Equality (==
) is No Longer Cheap
Pre-Valhalla, checking if two variables were the same was cheap. A single word comparison.
Valhalla changes that to depend on the runtime type of the object. This also implies an extra
null check, since the VM needs can’t load the class word eagerly. With a segfault handler
to try and skip the null check, the performance of ==
would no longer be consistent.
This isn’t the end of the world for high performance computing, but it doesn’t seem like that
big of a win. Everyone’s code bears the cost.
It appears most of the performance optimizations available to Valhalla are not yet in, so it’s hard to tell if the memory layout improvements are worth the expense.
Minor: IdentityHashMap
now is a performance liability. Don’t accidentally put in a value object
or else.
AtomicReference
How value classes will interact with AtomicReference seems to be an issue. While value objects
can be passed around by value, they can also be passed by reference, depending on the VM.
However, AtomicReference is defined in terms of ==
for ops like compareAndSet. Value objects
no longer have an atomic comparison. What will happen? Consider the following sequence of
events:
value record Point(int x, int y, int z) {}
static final AtomicReference<Point> POINT =
new AtomicReference<>(new Point(1, 2, 3));
- T1 - start
POINT.compareAndSet(new Point(1, 2, 3), new Point(4, 5, 6))
- T2 - start
POINT.compareAndSet(new Point(1, 2, 3), new Point(1, 2, 3))
- T2 - finish and win
compareAndSet()
- T1 - finish
compareAndSet()
A regular AtomicReference would return false
for T1, despite the value being the expected value
before, during, and after the call. We can use it to resolve a race. A value based object though:
what could it do?
Where is the Class Word?
Without object identity, most of the object header isn’t needed. The identity hash code,
synchronization bits, and probably any GC bits aren’t needed any more. But, what about
valueObj.getClass()
?
I can’t see an easy way of implementing it. If the class word is adjacent to the object state in memory, we don’t get nearly the memory savings we wanted.
If we had a single class pointer for an array of value objects, it still doesn’t help. Consider:
value record Point(int x, int y, int z) {}
Object[] points =
new Object[]{new Point(1, 2, 3), new Point(4, 5, 6)};
for (Object p : points) { System.out.println(p.getClass()); }
The VM would have to either prove every object in the array has the same class, or else store it per object.
It would be great to see how the class pointer is elided in real life.
Intrusive Linked Lists and Trees
Value objects’ state is implicitly final, which means they can’t really be used for mutable data structures. One of the things I miss from my C days is having a value included in a linked list node. This saves space, but doesn’t appear to work for value objects. The same goes for trees. I haven’t thought extensively about it, but denser data-structures don’t seem to be served by the Valhalla update.
Values Really Don’t Have Identities.
Ending on a positive note, one of the things I liked about JEP 401 was the attention called to mutating a value object. Specifically:
Field mutation is closely tied to identity: an object whose field is being updated is the same object before and after the update
Many years ago, I had an argument with a coworker about Go’s non-reentrant mutex, v.s. Java’s reentrant synchronizers. As most [civil] arguments go, both of us learned something new: Go’s mutexes can be locked multiple times. Behold!
package main
import (
"fmt"
"sync"
)
func main() {
var m sync.Mutex
m.Lock()
m = *(new(sync.Mutex))
m.Lock()
defer m.Unlock()
fmt.Println("Hello")
}
This code shows the problem. The mutex becomes a new object upon reassignment, despite being
the same variable. If the second .Lock()
call is removed, this code actually panics, despite
the Lock call coming before the Unlock, and there being the same number of Locks and Unlocks.
Java is saying the same thing here. Mutability implies identity.
Conclusion
At this point, I think the Valhalla branch is interesting, but not enough to carry it’s own weight. Without being able to see the awesome performance and memory improvements, it’s hard to tell if the language and VM complexity are justified.