Thursday 21 June 2012

Monkey's Garbage Collection on iOS

Not long ago I finally got around to buying the bits needed for iOS builds: a new iPod Touch, a ridiculously priced block of white plastic that runs OSX and Apple's 99 USD blessing to run my own code on the device I've already paid for. Besides the money there was a certain cost in time and frustration getting everything up-to-date and working but it is done and I can now build my project and run it on my iPod.

Well I guess this is enough to let you in the club.

I never really swallowed Monkey's promise of write-once and run pretty much anywhere so I wasn't expecting completely smooth sailing. Sure enough there have been a few wrinkles to iron out: graphics differences, an odd lack of response to touch events and performance issues - including the garbage collection (or GC for short) wrinkle that this post is about.

Some time ago someone else tested my Box2D port on their iOS device and posted on the Monkey forums that they were seeing poor frame-rates. I've not heard anything since but it stuck in my mind as a concern so I was specifically looking for performance issues when I first tried my game on my iPod. Happily the performance looked fine. I played a level with a reasonable number of entities in it and it seemed to be keeping a steady 30fps. Then I restarted the level a few times and everything got slower. A few more restarts and it was barely responding to input.

I'm Not The Only Problem Here


It was clear that I had a resource leak that was somehow holding onto game data between level restarts. What was unclear was why this was only happening on iOS or at least only causing a slow-down on iOS. An internal game bug that collected actual processing elements, such as holding on to multiple active instances of the physics simulation, shouldn't only occur on iOS so it must be something else. It seemed as good a time as any to take a look at the XCode profiler so I fired it up and re-ran my test of restarting a level several times. Not much digging was required. The game was spending increasing amounts of time in the garbage collection code marking objects.

Here's a quick explanation of that last sentence. Monkey is a garbage collected language. That means that if you create an object you don't need to explicitly free or destroy that object when you're done with it. When the reference goes out of scope or otherwise becomes unreachable by your code Monkey makes the promise that the memory it uses will be recovered.

Now several of Monkey's target languages/platforms provide their own garbage collection implementations: Java (Android), Javascript (HTML5), C# (XNA) and Flash. When building to those targets that's what your game will use. The C++ targets (GLFW and iOS) have no native GC implementation and so one is provided in the Monkey code itself. All well and good.

Garbage collectors come in many different flavours but Monkey's C++ GC is essentially a mark and sweep collector. What this means is that when the collector runs it first traverses the tree of object references in your application and "marks" those that are reachable, then it reclaims the memory or "sweeps" those that weren't marked. What the profiler was telling me was that my game was spending more and more time doing that "marking" process.

As I said, it was clear that I had a resource leak and it didn't take me long to track that down. The leak was resulting from a failure to remove one of my UI elements from a list of mouse event listeners. As the list existed for the lifetime of the game the UI element and everything attached to it remained reachable and so were never collected.

It's Already Spotless. You Can Stop Cleaning Now.


Problem solved? Only sort of. Once I'd fixed the leak I re-ran the test on my ipod and was unhappy to see that the marking process was still the biggest single chunk of time per frame. In fact I built a test that did nothing more than create and hold a bunch of objects and show the time spent in marking -- it seems that there's an approximate cost of approx 1ms per 10,000 live objects per frame on my 4th gen iPod. That's not talking about collecting objects that have been discarded, that's the standing cost of just having those objects. On every frame the Monkey code walks the tree of objects and marks them all and it takes that long to do it. Here's a graph of the average per frame cost:
Click to embiggen.

Why is this a problem? It's a problem because this behaviour is different from all the other GC implementations. Let's say you create a million map cell objects in your game and then don't allocate anything else while your game is running. If you do that on any of the other targets the GC cost will probably be zero. On iOS it would be crippling (on GFLW desktop hardware the marking process is generally so fast that it's not a problem, but it's just a question of scale, not principle).

How many objects are likely to be in a game? It totally depends but it's really not that difficult to get up there. My game is just a physics platformer with quite small levels and it runs up 20-30 thousand live objects in some situations. I could certainly do things to reduce that number, but it's kind of strange that I would have to do so just to bend to the behaviour of the garbage collector.

Trying To Be Helpful But Just Getting In The Way?


There is an argument for aggressive garbage collection and it's that you minimise the pauses seen in other garbage collectors. If you look at Android, for example, it's not unusual to see the garbage collector stop your application for a hundred milliseconds or more while collections take place. In theory, if you collect frequently, the interruptions will be smaller and more consistent and avoid pauses that are noticeable to the user.

The theory breaks down when I look at the iOS numbers, however:

Firstly, the load isn't that smooth. One frame might take a 1ms hit, another will take a 7ms hit. If you're close to the frame limit then that's easily enough variance to give you issues.

Secondly, and more importantly, the behaviour means that you can't escape the GC cost and if you're trying to do so on other platforms then you may well be making things worse on iOS. With the other GC implementations you can minimise their impact by not allocating objects during your game loop. On iOS that doesn't matter because you pay for every object you have live. What that means is that object pooling or caching strategies will benefit you on Android and actively work against you on iOS. This is not a recipe for cross-platform consistency.

What To Do?


Option one is to accept the situation and reduce your numbers of objects. If you can do that then it's not a bad thing to do in general, it just won't gain you much outside of iOS. There comes a point when avoiding using objects becomes an unreasonable expectation in an object-oriented language though.

Option two is to start performing surgery on the Monkey garbage collector. Clearly this is way over the line of abstracted cross-platform development but needs must when the devil drives and all that. An obvious change to make is simply to add a memory allocation trigger to the GC process: "Don't do anything unless x objects or x kilobytes of memory have been allocated". This would make it behave somewhat more like the standard Java collector in that you can expect strategies like object pooling to reduce garbage collector activity rather than increase it.

That memory allocation trigger could be extended all the way to "Don't do anything unless a memory allocation fails" but doing so would guarantee expensive collections and wouldn't work with the way that the GC currently attempts to spread the cost over a number of frames.

What I'll probably do is wait until the game is pretty much complete and then see if I can reduce allocations/tweak a collection trigger so that levels can be played through without a collection. Then I can add a call to force a collection to occur when the game is transitioning to the results screen or somewhere else where the player won't be bothered by a frame-rate hiccup.

If/when I settle on something I'll probably post the code for others to use. That's if the standard collector isn't altered before then. In the meantime my advice is to just be aware that the GC on iOS behaves differently to that on Android/Flash/XNA and the difference could be problematic if your game is large and/or has tight performance requirements.