Wednesday, 28 March 2012

Monkey Tip - Be Careful Around Boxes

No, this isn't a warehouse safety message about lifting with your legs. Monkey features auto-boxing/unboxing between classes and primitive types. Coders with backgrounds in other OO languages like Java will probably be familiar with the concept but I imagine there are plenty of Monkey users who aren't, so it's probably worth explaining it a little.

Boxes? Huh?


The term "box" means a class that contains, or wraps, a primitive value type. The standard examples in Monkey are the IntObject, FloatObject, BoolObject and StringObject classes that you'll find in the boxes file in the standard monkey module. These classes are generally used so that primitives can be treated as object instances and vice versa. The auto part of the boxing operations allows you to write code like this:

Local i:IntObject = 1
i += 1

Why would you want to do this? Well in Java this sort of thing is often employed so that primitives can be used with collections classes that require object references. Monkey's standard collection classes take a different route and don't use boxes but it's likely that you'll run into libraries with collections classes that do (Diddy, for example). Other than that it's sometimes useful to be able to declare that a class has a primitive representation or to create "smart" primitives. My JSON library makes use of boxing so that you can easily retrieve primitive values in the JSON structures. For example:

Local x:Float = jsonObject.GetItem("x")
Local y:Float = jsonObject.GetItem("y")
Local isActive:Bool = jsonObject.GetItem("isActive")
Local name:String = jsonObject.GetItem("name")

So, Monkey's automatic boxing and unboxing deals with the chore of converting between the representations and, in theory, the code is easier to write and read. I say "in theory" because the syntax is hiding what is actually going on. It's an example of what some refer to as syntactic sugar or, less flatteringly, "magic". It pays to understand what's actually happening, especially if you intend to use the feature yourself, as the reality can trip you up.

Pulling Back the Curtain


Let's demystify this language magic a little bit. What actually happens in the first example above? Here are those two lines of code compiled to C#:

bb_boxes_IntObject t_i=((new bb_boxes_IntObject()).g_IntObject_new(1));
t_i=((new bb_boxes_IntObject()).g_IntObject_new((t_i.m_ToInt())+1));

The first line is relatively straightforward -- the assignment is converted to a constructor for an IntObject instance.

The second line is a bit more interesting. It shows that the simple += operation is converted into a call to the ToInt method on our previously constructed IntObject, an addition and then another constructor call to create a new IntObject instance with the summed value.

And that's actually all there is to boxing and unboxing. It's simply a matter of implementing certain method signatures for construction with, and returning of, the boxed value. In general form they are:

Method New( value:{PrimitiveType} )
Method To{PrimitiveType}:{PrimitiveType}()

Where {PrimitiveType} is replaced with Int, Float, Bool or String, e.g. Method ToInt:Int(). All you have to do is implement those methods on any class and, presto, you've got an auto-boxing class. You don't have to implement all of them, for instance it is very common to only implement ToString, which provides a useful shortcut to creating debug prints like:

Print "My class info is: " + myBox

So what are the problems?

Performance


 The first one may have been obvious from the simple IntObject example above. Boxing and unboxing is much more expensive than using the primitive types. Assignment means construction of a new instance and that can lead to high garbage collection costs if you use boxed types without care. If you have need to process boxed values somewhere performance sensitive then you should unbox them, work with the primitive values and then assign them back to the boxed versions.

Compatibility


 Be aware that auto-boxing is a Monkey feature that didn't arrive fully formed. In particular, the demo version of Monkey is currently at 45c and some parts aren't working in this version, e.g. the "+=" operator from above. If you're releasing code for others to use then this may mean that you have to field bug reports/complaints.

True may no Longer be True and False may be a Null Pointer Exception


When you implement the box methods you are potentially changing how Monkey will interpret operations on objects of that class. A good example is the use of the following as a test for an uninitialised reference:

If myInstance
    DoAThing()
End

In the normal way of things, that code would only execute DoAThing() if myInstance had been initialised but not if it hadn't or if it was set to Null. However, if myInstance's class implements the ToBool method then Monkey will invoke it and attempt to unbox to test the boolean value. This will, of course, crash with a null reference exception of some kind if myInstance is uninitialised.

A similar problem can occur when doing type-testing via the casting mechanism:

If MyClass(myInstance)
    DoAThing()
End

Again, if MyClass declares ToBool then this will result in an attempt to unbox after the cast. If myInstance isn't an instance of MyClass then you'll get a null reference error. The answer to both of theses is to explicitly test against Null, e.g.

If MyClass(myInstance) <> Null

That way Monkey doesn't attempt to unbox to a Bool. In fact, I'd recommend getting into the habit of typing out the explicit test as this is a potential cause of very difficult to find bugs.

One May Not Equal One


Another gotcha to be aware of around boxed primitives and logic is that the "=" comparison won't cause values to be unboxed.

Local a:IntObject = 1
Local b:IntObject = 1

If a=b
    Print "This is never printed (unless Monkey starts to do interning and then it might be!)"
End

A is not equal to b in the example above because the comparison tests whether a and b are references to the same object and not whether they box the same value. To check the values you would have to unbox at least one of them yourself to make the comparison based on the integer value. Stylistically, I'd suggest explicitly unboxing both: "If a.ToInt() = b.ToInt()"

I'll end part one of this post here. Yes, there's more.

No comments:

Post a Comment