I’ve found that 99% of the time, when somebody asks a detailed technical question, the right first answer is “What are you really trying to do?” It’s often frustrating for the person who hears it; they think they understand their problem very well and that they’ve reduced it to a single technical point, but in reality approaching the issue from a completely different direction is the way to go. I ran into this with myself this morning when working on my thesis project, amock.
amock starts with a dynamic analysis of a Java program; it is implemented (for simplicity) as an offline analysis. The program is run in a special instrumented mode which dumps out a trace of every method called to a file, which is later processed by the actual analysis algorithm. In the trace generated by my instrumentation, every non-primitive object is assigned a unique integer ID, so that the processor can tell which object is which. It does this by using a weak identity hash map: a chimera of the standard Java classes WeakHashMap and IdentityHashMap.
The problem is, this doesn’t work for creating trace entries about constructors. When you’re calling a constructor, the semi-initialized object is this weird weird thing that you basically can’t examine at all… trying to do basically anything to it other than call ‘dup’ or call a constructor on it makes the verifier cry. Thus my instrumentation has to go to extra efforts to substitute a dummy ConstructorReceiver object into the calls to my tracer implementation, because you can’t actually pass a semi-initialized object to methods. This means that my trace entries for entering-constructor all have no idea which object they’re constructing, and there’s no good way to tell from the trace if two different entering-constructor entries refer to the same object (calling superclass constructors on it, say) or not. (The exiting-constructor entries do have an identity for the object, though, because the object is initialized by then).
So this morning I spent about an hour poring over the JVM spec and trying to come up with complicated instrumentations that would extract some information from this annoying uninitialized object. I had several different plans; for example, I started trying to work out how to instrument every single constructor in the system to add an extra object-id argument. I went to one of my labmates and tried to get advice on how to instrument the Java bytecode in order to somehow get this to work.
“What are you really trying to do?” he asked.
So I started explaining from the beginning. And suddenly it hit me: I didn’t need a fancier instrumentation. All I needed to do was write a braindead two-pass algorithm that reads in a trace file, matches entering-constructor and exiting-constructor entries, and spits out a trace with the additional information in it. It took me ten minutes and worked the first time.
Post a Comment