Pointless Diversions: 2012

Thursday 21 June 2012

Monkey's Garbage Collection on iOS

Not long ago I finally got around to buying the bits needed for iOS builds: a new iPod Touch, a ridiculously priced block of white plastic that runs OSX and Apple's 99 USD blessing to run my own code on the device I've already paid for. Besides the money there was a certain cost in time and frustration getting everything up-to-date and working but it is done and I can now build my project and run it on my iPod.

Well I guess this is enough to let you in the club.

I never really swallowed Monkey's promise of write-once and run pretty much anywhere so I wasn't expecting completely smooth sailing. Sure enough there have been a few wrinkles to iron out: graphics differences, an odd lack of response to touch events and performance issues - including the garbage collection (or GC for short) wrinkle that this post is about.

Some time ago someone else tested my Box2D port on their iOS device and posted on the Monkey forums that they were seeing poor frame-rates. I've not heard anything since but it stuck in my mind as a concern so I was specifically looking for performance issues when I first tried my game on my iPod. Happily the performance looked fine. I played a level with a reasonable number of entities in it and it seemed to be keeping a steady 30fps. Then I restarted the level a few times and everything got slower. A few more restarts and it was barely responding to input.

I'm Not The Only Problem Here

It was clear that I had a resource leak that was somehow holding onto game data between level restarts. What was unclear was why this was only happening on iOS or at least only causing a slow-down on iOS. An internal game bug that collected actual processing elements, such as holding on to multiple active instances of the physics simulation, shouldn't only occur on iOS so it must be something else. It seemed as good a time as any to take a look at the XCode profiler so I fired it up and re-ran my test of restarting a level several times. Not much digging was required. The game was spending increasing amounts of time in the garbage collection code marking objects.

Here's a quick explanation of that last sentence. Monkey is a garbage collected language. That means that if you create an object you don't need to explicitly free or destroy that object when you're done with it. When the reference goes out of scope or otherwise becomes unreachable by your code Monkey makes the promise that the memory it uses will be recovered.

Now several of Monkey's target languages/platforms provide their own garbage collection implementations: Java (Android), Javascript (HTML5), C# (XNA) and Flash. When building to those targets that's what your game will use. The C++ targets (GLFW and iOS) have no native GC implementation and so one is provided in the Monkey code itself. All well and good.

Garbage collectors come in many different flavours but Monkey's C++ GC is essentially a mark and sweep collector. What this means is that when the collector runs it first traverses the tree of object references in your application and "marks" those that are reachable, then it reclaims the memory or "sweeps" those that weren't marked. What the profiler was telling me was that my game was spending more and more time doing that "marking" process.

As I said, it was clear that I had a resource leak and it didn't take me long to track that down. The leak was resulting from a failure to remove one of my UI elements from a list of mouse event listeners. As the list existed for the lifetime of the game the UI element and everything attached to it remained reachable and so were never collected.

It's Already Spotless. You Can Stop Cleaning Now.

Problem solved? Only sort of. Once I'd fixed the leak I re-ran the test on my ipod and was unhappy to see that the marking process was still the biggest single chunk of time per frame. In fact I built a test that did nothing more than create and hold a bunch of objects and show the time spent in marking -- it seems that there's an approximate cost of approx 1ms per 10,000 live objects per frame on my 4th gen iPod. That's not talking about collecting objects that have been discarded, that's the standing cost of just having those objects. On every frame the Monkey code walks the tree of objects and marks them all and it takes that long to do it. Here's a graph of the average per frame cost:

Click to embiggen.

Why is this a problem? It's a problem because this behaviour is different from all the other GC implementations. Let's say you create a million map cell objects in your game and then don't allocate anything else while your game is running. If you do that on any of the other targets the GC cost will probably be zero. On iOS it would be crippling (on GFLW desktop hardware the marking process is generally so fast that it's not a problem, but it's just a question of scale, not principle).

How many objects are likely to be in a game? It totally depends but it's really not that difficult to get up there. My game is just a physics platformer with quite small levels and it runs up 20-30 thousand live objects in some situations. I could certainly do things to reduce that number, but it's kind of strange that I would have to do so just to bend to the behaviour of the garbage collector.

Trying To Be Helpful But Just Getting In The Way?

There is an argument for aggressive garbage collection and it's that you minimise the pauses seen in other garbage collectors. If you look at Android, for example, it's not unusual to see the garbage collector stop your application for a hundred milliseconds or more while collections take place. In theory, if you collect frequently, the interruptions will be smaller and more consistent and avoid pauses that are noticeable to the user.

The theory breaks down when I look at the iOS numbers, however:

Firstly, the load isn't that smooth. One frame might take a 1ms hit, another will take a 7ms hit. If you're close to the frame limit then that's easily enough variance to give you issues.

Secondly, and more importantly, the behaviour means that you can't escape the GC cost and if you're trying to do so on other platforms then you may well be making things worse on iOS. With the other GC implementations you can minimise their impact by not allocating objects during your game loop. On iOS that doesn't matter because you pay for every object you have live. What that means is that object pooling or caching strategies will benefit you on Android and actively work against you on iOS. This is not a recipe for cross-platform consistency.

What To Do?

Option one is to accept the situation and reduce your numbers of objects. If you can do that then it's not a bad thing to do in general, it just won't gain you much outside of iOS. There comes a point when avoiding using objects becomes an unreasonable expectation in an object-oriented language though.

Option two is to start performing surgery on the Monkey garbage collector. Clearly this is way over the line of abstracted cross-platform development but needs must when the devil drives and all that. An obvious change to make is simply to add a memory allocation trigger to the GC process: "Don't do anything unless x objects or x kilobytes of memory have been allocated". This would make it behave somewhat more like the standard Java collector in that you can expect strategies like object pooling to reduce garbage collector activity rather than increase it.

That memory allocation trigger could be extended all the way to "Don't do anything unless a memory allocation fails" but doing so would guarantee expensive collections and wouldn't work with the way that the GC currently attempts to spread the cost over a number of frames.

What I'll probably do is wait until the game is pretty much complete and then see if I can reduce allocations/tweak a collection trigger so that levels can be played through without a collection. Then I can add a call to force a collection to occur when the game is transitioning to the results screen or somewhere else where the player won't be bothered by a frame-rate hiccup.

If/when I settle on something I'll probably post the code for others to use. That's if the standard collector isn't altered before then. In the meantime my advice is to just be aware that the GC on iOS behaves differently to that on Android/Flash/XNA and the difference could be problematic if your game is large and/or has tight performance requirements.

Monday 21 May 2012

Tetris In... How Long?

Yes, I know I write very long blog posts. Feel free to skim. There's a playable thingy linked below!

The Preamble

I've been following Andy Moore's blogging, vlogging, podcasting and social networking output for a fair while. Andy is the dev behind Steambirds and has recently started iOS dev with Iceburgers. He's often got interesting things to say about his life as an indie, information about the business aspects, technical tips for AS3 coders and he's also involved in a lot of face-to-face social stuff (I'm neither an AS3 coder or social but it's fun to hear about them from a safe distance). As far as I can tell, he's an all-round decent chap.

But I'm not liable if he kills you and wears your skin as pyjamas.

However, one thing he's mentioned a few times in the past has always left me kind of perplexed: doing game-jams on very short timescales. I'm not part of the game-jam scene. I've started a few 48-hour jams in the past and sort of finished one of them. I normally take a look at the Ludum Dare theme to see if I'm inspired but I haven't yet been inspired enough to spend a weekend in crunch. I can at least see that you can create a complete game in 48 hours though. Andy talks about writing a game in five minutes.

My thoughts? In a word: "bollocks". I didn't think he was lying -- he strikes me as a very honest person. I just had to assume that there was some glossing over of the reality of what comes before that five minutes and that there would be a large gap between my expectations of something I'd call "a game" and what he was talking about creating. Could it be a bit of fun? Maybe. Some sort of lesson about scope or perhaps about using short goal-oriented sprints to get past motivation issues or to test simple ideas? Perhaps.

The Actual Thing

So, anyway, Andy's been on a bit of a video spree recently. He's started a new vlogging series, which you should check out, and he's been doing a bit of live streaming. A few days ago he did a stream with a five minute dev challenge, so I got a chance to see what was up. Well, pretty much what I thought. The starting point was a template project ready to go and the end result was a moving a block with the mouse and a debug print if that block collided with some other blocks. That's no criticism of Andy as it's pretty brave to try such a thing in front of a live audience. To be honest I was relieved not to be proven wrong. Frankly it would have been awful if he'd whipped up a complete platformer or something in the time.

Harsh, but I told you he was honest.

This morning I noticed that Andy had tweeted that he was going to do another live stream; this time to build Tetris in fifteen minutes. To me this was much more interesting that just "a game" in five minutes. Tetris is a game that we're familiar with. It's not very complicated, but it's not utterly trivial either. I've never implemented it before, but it's something that I'd suggest as a project for someone just starting out with games coding and expect them to take a few days or maybe a week or so to pull something together. I'd imagine that an experienced coder could turn a basic, unpolished but technically solid version around in a day. How quickly could you just hack it together though? Surely not fifteen minutes?

I decided to see what I could do with plain out-of-the-box Monkey and then compare with Andy's run at it. Handily, he put a countdown timer on his stream, which meant that I could kick off at t-15 and see how far I got. Before I started, I quickly thought of what makes up a valid "Tetris" to me:

Grid x by y (I couldn't recall the dimensions)
All the Tetris shapes in differing colours
Random piece generation plus next piece indicator
Pieces can be moved left,right,down and can be rotated
Shape movement is locked to grid (vertical drop visual increment can be smaller, but left and right are snapped to the grid and blocks can only settle at a snapped position)
Line removal
Fail state when a block settles at the top

There are probably some bits missing that Tetris fans would insist on and maybe some parts could be argued as irrelevant to the core design on the other end but I think it's a reasonable starting point. Here's how the fifteen minutes went:

2 minutes: Creating new Jungle IDE project. Laptop is being slow so it takes a while.
4 minutes: Written the main function, app template and an outline Grid and Piece class.
5 minutes: Jungle starts flaking out. Throws error dialogs when trying to autocomplete type definitions.
7 minutes: I decide that Jungle is borked and restart it.
8 minutes: Jungle restarts and informs me there's an update. I decide to take it in case it resolves the instability I just experienced.
12 minutes: I've got the grid render written but untested
14 minutes: Base definitions for five pieces added. Not sure if the structure is good.
15 minutes: I've added left, right and drop movement and the code for instantiating a random piece from the set. and started looking at the collision. Time's up. I haven't even compiled yet.

So much for that. I watch Andy make his attempt. Andy gets something up on screen, which is more than I do. Essentially, he gets a block that follows the mouse. You can drop the block and they will stack. It's very noticeable that he pays very little attention to most of the stuff I listed above. For example, I wrote multiple shapes from the start and grid-based movement while Andy did neither. His focus was entirely on getting something on screen, not on getting Tetris on screen.

Again, this was pretty much what I expected and I'm relieved that he didn't blow me out of the water. Thumbs up from me for even trying it on a live stream. I left the stream not long after (need to preserve the bandwidth down here in NZ) but before I did there were a few suggestions for extra increments and Andy seemed to admit that he hadn't considered elements like row removal in what he had written. Clearly we had come at the problem from entirely different angles. If anything, we saw completely different problems.

I went off and toasted a bagel, made myself a coffee and thought about it a bit more. I was kind of proven right that you can't really write Tetris in fifteen minutes, but how long would it take to complete my list of functional requirements? I could afford a morning to see how far I got, so that's what I did. I got myself a little timer app and started doing micro-iterations. I stopped the timer if I got interrupted or when I got to a stage where it was worth making a note or taking a screen grab. The approach was to meet the functional requirements but to ignore any desires to pretty the code unless it helped me get where I was going faster. Here are my notes:

20 minutes: After umm-ing and ahh-ing over collision for a minute I just hacked in the first thing that came to mind. It won't handle rotation though.

22 minutes: Attempt first compile expecting to see block dropping down the screen. Of course there are compilation errors.

25 minutes: Errors cleared. I have to fix them one at a time thanks to Monkey's "something's wrong, I give up!" attitude to parsing. It runs, but I've got a nice blank screen for my troubles.

28 minutes: I had forgotten to set the update rate, which on HTML5 results in no updates at all. I'd also neglected to consider that the active piece needs to be added to the grid as it moves or be drawn separately. I implemented code to remove and re-add itself as it moves. I don't like it. Still, there's something on screen. As soon as I saw it I realised that I haven't put in detection for hitting the floor. Also, the size is clearly horribly wrong.

Awesome, isn't it?

34 minutes: I now have blocks dropping, colliding, settling and a new block coming in at the top (although it doesn't appear in the right place).

43 minutes: Added left, right and drop movement. Code starting to get hairy with ugly bounds checks and cut and paste. Although the movement works, there's a bug where I get to keep moving old pieces.

56 minutes: I've now got row removal. The code is a mess though. Also fixed the bug from the last iteration, which was caused by me failing to create copies of the base pieces so I was recycling references.

Look, one of each. That's amazing... oh, it's a bug.

58 minutes: Added a fail state. All it does is check if a spawned piece is already colliding. If it is then it prints "Failed" to the console and re-initialises the grid. Not very satisfying and probably different from Tetris proper but it matches the requirement. Rotation next. This will be quite a big chunk, I think.

1h16m: I've got rotation (one way) apparently working. I'm totally unconvinced by the code though. If I wasn't working against a ticking clock and actually intended to build a Tetris for release I'd definitely stop at this point and rethink. I've done it by changing all shape definitions to be 4x4. These are rotated clockwise and then the shape is moved to the top left of the grid again. It's quite inelegant. Also, I've just realised that it doesn't account for collisions on rotation.

1h21m: Just fixed the rotation so that it won't rotate into a colliding position. I also added the two "L" shaped blocks that I missed from the Tetris set. I just need the next piece indicator now as far as features.

1h31m: I've got the next indicator in. It's not pretty and the code to do it is a chuckle-worthy hack for re-using the existing grid code. However, we're now feature complete. Now to do a bit of testing and look at some pictures of actual Tetris to see about grid dimensions and stuff.

Fat and grey. Like creator, like creation, I guess.

1h50m: Right, a bit of research dug out the standard play area size (10x20) and the block colours. Getting the block colours involved picking them off an image from wikipedia. That required me to find my "everywhere eyedropper" tool that I rarely use and then paste all the colours into the code, correct the formatting and write the bit to reference the palette. Hence nearly twenty minutes. Also, I've found a crash bug when you lose, which I shall now have to fix...

1h55m: Fixed the crash bug, and also moved the spawn to the middle. However, I also noticed a bug where pieces weren't colliding properly on the right side. I fixed that and then realised that it now stops you rotating a piece on the right if it goes off the playfield. I'm sure in the original that it simply shifts the piece over in that scenario.

2h04m: Okay, I've added a check to see if the rotated piece is hanging off the right and, if so, it attempts to move it in. The piece still won't rotate if it's blocked by another Tetris piece on the right but I'm less sure if the original does this. As I'm also unsure of the precise row removal rules I'm going to leave it here as "job done".

The Result

I guess it's done... hey, look the next piece is a long one. Do we need multi-line bonuses?

If you want to play it, you can do so here in html5 and here built to flash. The controls are up arrow to rotate, left and right arrows for the obvious and down is a surprise... not really. If you want the code for some insane reason, it's here.

The Opinionated Bit

So, based on that, and assuming that I can at least be considered in the range of "average" as a programmer, the realistic timescale for banging out a recognisable and playable Tetris clone is about two hours. To be honest, considering I stopped the timer if I went to the loo, made a coffee or took a break and played Portal 2 for half an hour and those all allow a bit of subconscious processing I'd put it higher.

If we throw out "realistic" and talk about "possible", what then? If you discard the issues I had with my tools, then 1h45ms. If you discard my fussiness about the colours, playfield size and such then maybe 1h15m. If you're just hugely better than me and/or have written very similar code before then I'm sure you could do it in under an hour, but I think we're getting into an area where the great majority of people who try to do it are going to fail to come close. The only way I can believe you'd do it in 15 minutes is if you're pretty much just typing it from memory and that's not development as far as I'm concerned.

So, two hours for Tetris. Still quite fast, eh? Here's the thing though. While what I've got is "Tetris" it's so far away from being a releasable game that it's unreal. Firstly, the code sucks. Most of it is right on the brink of being more expensive to continue with than to rewrite. Even working on my own I'd be unable to face leaving it as is and there's no way it'd stay that way if the code was shared.

Beyond the technical debt there are still a whole bunch of missing features: there's no score or length of time played to provide achievement; no score-board to record that achievement; no difficulty curve; no music or sound effects; no menu system/options. It looks bloody awful. The movement is okay-ish but needs work. The row deletion mechanism is questionable as a modern Tetris clone and just uninteresting and unsatisfying aesthetically. I've no doubt there are bugs in there. I'd say there was at least another several days of work and tuning to get it even done enough to throw out as a freebie web game (not that I would, the world has enough Tetris clones).

Would I encourage someone to do a game jam or prototype sprint? Sure. I think it's great for people to just try building a game. I do wonder sometimes if the message gets confused between "Just have a go" and "It's easy" though. If you want to try to make a game in five minutes or fifteen minutes or even Tetris in two hours then go for it. Can you do it? Maybe, although you may have to bend the meaning of "making a game" at the short timescales. Will you do it? Probably not. Is it worth doing? Only if you enjoy it and maybe finding out if you enjoy it, even if you fail, is the real thing to take away if you're interested in games development.

Sunday 6 May 2012

1D vs 2D Arrays, the Performance Reality

Programmers are generally seen, and like to think of themselves, as logical, rational people. Sometimes we get titles like "software engineer" that flatter us with the implication that we deal in known quantities and proven methods (at least to whatever extent that the reputation of the title "engineer" has survived its abuses).

Unfortunately, the truth is that all of us occasionally operate more on faith than fact and there are large numbers of coders who work with a cargo-cult knowledge base rather than a deep understanding of their work. It may seem odd for such an definitively digital field but human nature manages to create myth, rumour and unquestioned traditions of received knowledge even in the cold clarity of a world of ones and zeroes.

Even if you avoid the pitfalls of the incurious mind, there are plenty of errors of over-simplification and failure to differentiate context to be made. What's true in one language, on one platform, or in one framework, may not be true on another. Knowledge can be falsified by time as the underlying technology changes: a new processor, a new OS, a new language revision, an updated library.

These traps are common enough for programming in general but they're thick on the ground in the realm of performance. If you push it again to dealing with performance concerns on a cross-platform, translated programming environment like Monkey there are almost more traps than safe ground.

All of which brings me to the topic of the post, because the "fact" that 1D arrays are hugely faster than 2D arrays is one that I've seen repeated on a number of occasions in the Monkey forums. I asked in the past if there was any evidence for the claim but didn't get any reply. While I understood why it might be true in certain circumstances, it was hard to swallow as an unsupported assertion. So, when it recently raised its head again, I got the urge to see if it stood up to scrutiny.

1D? 2D? What?

I'll just quickly cover what we're talking about here. In the world of 2D games, it's very common to use some sort of equally 2D array of values to represent aspects of the game world. An obvious example would be a tiled map where the world is basically a grid of tile-type values. The straight-forward approach to this is to use an actual 2D array so that accesses are of the form:

Local array2d:Int[][]
'initialise array here
Local val:Int = array2D[row][col]

The proposed faster alternative is to use a 1D array and to do the indexing calculations yourself:

Local array1d:Int[] = New Int[ROWS*COLUMNS]
'initialise array here
Local val:Int = array1D[row*COLUMNS + col]

Why would that be faster?

Firstly, if you're iterating over an array in some manner, you can avoid some of the indexing arithmetic. Hitting every element only requires incrementing a single index and you can traverse a column by simply incrementing by the row length. This is one step removed from the standard C technique of using a pointer to traverse an array rather than a subscript index (which is also questionable depending on circumstance, but that's another story).

Secondly, it avoids the construction of multiple arrays as Monkey uses the array of arrays model for >1D. This might be cheaper to initialise, it also might be cheaper on garbage collection costs if you're throwing these arrays away. When accessing a value, it's possible that there are costs from the row indirection that are avoided by directly calculating an index. There's also a possibility that you'll get more benefit from lower-level things like predictive caching if all the data is in one consecutive lump.

Sounds convincing. Why doubt it?

Because none of that may be true for a given target or implementation. Remember that Monkey isn't a directly compiled language. It gets translated to another language and then that code is compiled for a platform target. Those targets have very different properties and optimisation paths. Making broad simplistic statements about micro-optimisation performance across all those targets seems, well... simplistic. Pushing those statements to the point where they're blanket "1D is always much faster than 2D" is even worse.

I've been doing this coding thing for a fair while now and much of it has been spent in environments where performance is a significant concern. I've worked on low-level C programming, embedded Java apps, desktop .NET applications and in high-level scripting languages. Every one has thrown up instances of people (including me) doing things based on "facts" about performance that turned out to be entirely untrue, partially untrue or just so small in effect that there was no point in doing it.

The Tests

I set up a very simple test that constructed and initialised the values of an array, performed sequential iteration over all the elements of the array and then a large number of random accesses to the array, timing each group separately. The accessed array data was just added to a total value to minimise the costs outside of the array accesses and loop structures required.

The test ran with five variations on a one million element array of Ints: 10x100,000, 100x10,000, 1,000x1,000, 10,000x100 and 100,000x10. What's that about? Well the major difference between a 2D array and a 1D array in most OO languages is that the 2D array is an array of arrays. This means that each row is another array instance, therefore there is potentially a big difference between a "wide" array versus a "tall" array (for the purposes of this post I'll talk about arrays as row x col).

So, lets take a look at some graphs shall we? Who doesn't like graphs?

Construction And Initialisation

Here are the results from the construction and initialisation part. The array instances were constructed and then the elements initialised to known values:

The graph is displaying the percentage difference between the timing for a 1D array and a 2D array. So, if the 2D array took twice as long, that would register as 100%. If it took half as long, then it shows as -50%.

There are some obvious big spikes on the Android and XNA platforms when dealing with the "tall" arrays, which is where I'd expect to see the biggest difference because of the numbers of objects being constructed. The gap rapidly narrows though and it's 20-30% for the square array. The Flash results are a bit weird but that double spike is repeatable - I don't know what's going on there - the other variants are pretty much even though. The GLFW target shows the 2D array as faster in all but the tallest case.

The HTML5 results I'm showing here are from Chrome 18. I didn't spot how decidedly screwy they were until I plotted the graph. Not screwy as in the results are wrong, just screwy as in those results are bizarre. For whatever reason, this particular test runs very slowly on the 1D array. I'll deal with the various browsers separately later. For now just take it as a clear demonstration of how actual performance can vary from expectation.

Iteration

The next test is simply iterating over the whole array and summing all the values. Again, the graph shows the percentage difference in time for the 2D operation versus the 1D.

Those Android figures show a fairly solid advantage to the 1D array. The rest though? Outside of the one spike for the tall array on GLFW it's all pretty close. For the faster targets the variation is small enough that the 1ms timing precision is a factor. I certainly wouldn't say anything other than "they're pretty much the same" for XNA, Flash and HTML5.

Random Access

The last test was to generate a list of random row and column indices and then run through them, summing the values. This sort of accessing tends to frustrate low-level optimisation techniques like predictive caching.

Now that is a pretty dataset if I do say so myself. The pattern across all the targets is clear and I think it can do without detailed commentary.

Are We There Yet?

Okay, so what do we make of the "1D arrays are much faster than 2D" now we've got the data? Well first we can see the nugget of truth in it -- if your array is tall and narrow going with 1D indexing will likely be faster. On the other hand, you could just swap your columns and rows and it's going to be a dead heat or maybe even a win for the 2D representation. For the usual case where the array is square-ish I can't see that the small wins that are there are worth the bother.

Frankly the number of situations where 1D might be worthwhile seems like it's going to be pretty small. Even on Android, where the 1D array is consistently faster across the board, I'd struggle to justify fiddling about with array index calculations. "No way!" I hear the 1D fans shout "There's a 20-50% performance boost!". Well, yes and no. What we're looking at here is a micro-benchmark that is specifically intended to put the costs of array access in the spotlight and it does that by doing very little with the data accessed. If you're doing anything more than adding integers then the picture changes quite rapidly.

That's a chart showing the Android costs for 2D versus 1D when calling a simple addition function versus a function that does just an additional multiply, Sin and a Cos operation. The earlier tests did the addition inline and you can see that just the function call overhead on the addition mutes the advantage significantly. As the function gets more computationally intensive that effect increases and our simple bit of trig reduces everything but the worst case to under 5% difference.

Whatever array access costs are in play will quickly become swamped by any real calculation work done on the data being accessed. Unless your game loop is pretty close to the perfect storm of just summing the elements of an enormous array the actual visible frames per second speed increase from going with a 1D array is likely to be non-existent.

Don't Listen To Me

So, perhaps I've convinced you that 1D is a waste of time. Don't bother. Not worth it. Right?

Wrong.

Practically, what I'm saying is that you should just leave this technique in the toolbox until you've identified a performance hot-spot where you could gain some benefit. Hopefully the data above gives you some insight into what that situation might look like. If you find such a situation then bring it out, add a little messiness to your code and enjoy the benefits of a worthwhile speed boost in exchange.

More generally I'm saying that Monkey isn't BlitzMax, or C or Java or anything else. Micro-optimisations often fail to remain valid between compiler revisions on the same OS and hardware and they're far, far less likely to be valid across all Monkey targets. Further, when people lean on such tricks to gain performance it tends to blind them to larger optimisations at the data structure and algorithm level that can make a much greater difference and which are far more likely to hold value across multiple targets.

That HTML5 Thing

Oh, yes. So those Chrome initialisation results where 2D was many times faster than 1D? Here are the results for all the tests across Chrome, FF and IE:

There you have it. One target and there are circumstances where no matter if you go with 1D or 2D you'll lose by as much on one browser as you'll gain on another. Feel free to try to make a snappy catch-all recommendation from that.

Wednesday 25 April 2012

Monkey Tips - Default Types And Implicit Type Conversion

This one is mostly for people new to Monkey and coming from other languages. Here's some Monkey code:

Local myVarA = 1
Local myVarB = 1.5
Local myVarC = True

If you're like me when I first saw Monkey, I assumed that what I was seeing there was type inference. If I don't declare the variable type then you'll work it out for me? Awesome. So why does this not work?

Local myVarD = "Hello World"

The answer is that no type inference is occurring. If you don't declare a variable's type then Monkey will default that type to be an integer. Further, Monkey performs implicit type conversion between some primitives. Floats can be converted to Ints and vice versa and Bools can be converted to Ints (but not Ints to Bools). You don't even get a warning that this is happening and the first three examples above all compile happily with you potentially none the wiser that you've just declared three Ints with a value of 1.

Now some people might consider that sort of thing convenient. For me, that the float and boolean to int conversions just go by without Monkey even raising an eyebrow is convenient like leaving your roller skates at the top of the stairs so you don't have to go rummaging in the wardrobe. You absolutely will end up accidentally lopping off your float precision at some point.

There is a partial answer to this danger and that's to switch on "strict mode" by adding the Strict keyword to the top of your file. This mode will raise an error if you don't declare a type - although it has to be said that it's not a very good error: "Illegal type expression" rather than something a little more descriptive like "Type is undefined". Strict mode doesn't do anything about implicit type conversion though and it still happens silently in literal assignments or calculations.

I should mention that Monkey does offer type inference, but you have to explicitly invoke it via the ":=" operator:

Local thisIsAFloat := 1.5

<sarc>Which, I'm sure you'll agree, is something you'd never forget or mess up. Even if you did miss the colon, you couldn't fail to spot such a glaring error, right?</sarc> Seriously, just use Strict.

Sunday 1 April 2012

Monkey Tips - Boxes Caveats Part 2

This is a continuation from Part 1 of my discussion of primitive type boxing in Monkey. In this part I'll get into issues that primarily concern expectations if you're coming from Java or a language that handles auto-boxing in a manner similar to Java.

We're Not Writing Java Any More, Toto

One thing to be aware of is that Monkey's base box classes are not as integrated into the language as Java's implementation. In Java, the compiler will perform boxing in situations where Monkey will not. If I have a Java method that takes an Object reference:

public void add( Object value ){...}

Then Java allows me to call it with a boxable primitive value: add(10) or add(true), and the compiler will automatically box to an Integer object or a Boolean object respectively. Monkey won't do this. However, if, in Monkey, you declare a method that takes a box object reference:

Method Add:Void( value:IntObject )

Then you can call it with primitive values that are boxable by that type: Add(10).

Hey Toto, Is a Scuba Diver a Mammal or a Fish?

This one is more of a gotcha on a design level than usage. Let's say I have a generic container class that can store primitives or Object references and offers an overloaded Add method to do this:

Class MyContainer
Method Add:Void( value:Int )
Method Add:Void( value:Float )
Method Add:Void( value:String )
Method Add:Void( value:Bool )
Method Add:Void( value:Object )
End

Now this works fine, until you try to store an object that implements an unboxing method. If you do that, you get an error: "Unable to determine overload to use". As far as Monkey is concerned the thing you are passing can be an Object or it can be one of the primitives. In order to successfully store the box object you have to explicitly cast the reference: myContainer.Add( Object(myBox) ).

This might seem fair enough but it is somewhat odds with how things work in other OO languages. In Java, the rules are that if a match can be made without boxing or unboxing then that match is used and in this case the boxed object is a descendant of Object, so that's a clear match.

There's No Place Like Home?

Why doesn't Monkey do the same? Well, when I raised this problem on the Monkey forums the term "exact match" was used, implying that the box class isn't an exact match for an Object so the overload can't be resolved. I'd argue that this is inconsistent with basic OO concepts. In OO, descendants have to be substitutable for their parent classes. In other words you have to be able to use a descendant class anywhere you can use the base class and so they are matches by definition. Requiring that you cast a reference to be an Object reference is a nonsense as all references must be Object references (because that's the root object type and everything descends from it).

I'd also argue that this behaviour is inconsistent with Monkey's own behaviour elsewhere. The "=" comparison mentioned in the first boxing post is an example of this. In that case our box objects could arguably be considered to match a primitive or an Object reference but Monkey clearly chooses to prefer to compare them as Objects, which makes sense because that's actually what they are.

Needless to say I'm not a fan of this language design choice. It may be that the majority of Monkey programmers will never come across the issue, or maybe they won't see it as a problem if they do, but it has certainly caused me some headaches. It forced me to avoid using an overloaded primitive/Object API for my JSON library with the result that the API is not as clean and consistent as I'd like. No, it's not insurmountable, but it's the kind of rough edge that can turn developers off a language. Being different is fine, but being arbitrarily different in ways that don't seem very helpful isn't.

Wednesday 28 March 2012

Monkey Tip - Be Careful Around Boxes

No, this isn't a warehouse safety message about lifting with your legs. Monkey features auto-boxing/unboxing between classes and primitive types. Coders with backgrounds in other OO languages like Java will probably be familiar with the concept but I imagine there are plenty of Monkey users who aren't, so it's probably worth explaining it a little.

Boxes? Huh?

The term "box" means a class that contains, or wraps, a primitive value type. The standard examples in Monkey are the IntObject, FloatObject, BoolObject and StringObject classes that you'll find in the boxes file in the standard monkey module. These classes are generally used so that primitives can be treated as object instances and vice versa. The auto part of the boxing operations allows you to write code like this:

Local i:IntObject = 1
i += 1

Why would you want to do this? Well in Java this sort of thing is often employed so that primitives can be used with collections classes that require object references. Monkey's standard collection classes take a different route and don't use boxes but it's likely that you'll run into libraries with collections classes that do (Diddy, for example). Other than that it's sometimes useful to be able to declare that a class has a primitive representation or to create "smart" primitives. My JSON library makes use of boxing so that you can easily retrieve primitive values in the JSON structures. For example:

Local x:Float = jsonObject.GetItem("x")
Local y:Float = jsonObject.GetItem("y")
Local isActive:Bool = jsonObject.GetItem("isActive")
Local name:String = jsonObject.GetItem("name")

So, Monkey's automatic boxing and unboxing deals with the chore of converting between the representations and, in theory, the code is easier to write and read. I say "in theory" because the syntax is hiding what is actually going on. It's an example of what some refer to as syntactic sugar or, less flatteringly, "magic". It pays to understand what's actually happening, especially if you intend to use the feature yourself, as the reality can trip you up.

Pulling Back the Curtain

Let's demystify this language magic a little bit. What actually happens in the first example above? Here are those two lines of code compiled to C#:

bb_boxes_IntObject t_i=((new bb_boxes_IntObject()).g_IntObject_new(1));
t_i=((new bb_boxes_IntObject()).g_IntObject_new((t_i.m_ToInt())+1));

The first line is relatively straightforward -- the assignment is converted to a constructor for an IntObject instance.

The second line is a bit more interesting. It shows that the simple += operation is converted into a call to the ToInt method on our previously constructed IntObject, an addition and then another constructor call to create a new IntObject instance with the summed value.

And that's actually all there is to boxing and unboxing. It's simply a matter of implementing certain method signatures for construction with, and returning of, the boxed value. In general form they are:

Method New( value:{PrimitiveType} )
Method To{PrimitiveType}:{PrimitiveType}()

Where {PrimitiveType} is replaced with Int, Float, Bool or String, e.g. Method ToInt:Int(). All you have to do is implement those methods on any class and, presto, you've got an auto-boxing class. You don't have to implement all of them, for instance it is very common to only implement ToString, which provides a useful shortcut to creating debug prints like:

Print "My class info is: " + myBox

So what are the problems?

Performance

The first one may have been obvious from the simple IntObject example above. Boxing and unboxing is much more expensive than using the primitive types. Assignment means construction of a new instance and that can lead to high garbage collection costs if you use boxed types without care. If you have need to process boxed values somewhere performance sensitive then you should unbox them, work with the primitive values and then assign them back to the boxed versions.

Compatibility

Be aware that auto-boxing is a Monkey feature that didn't arrive fully formed. In particular, the demo version of Monkey is currently at 45c and some parts aren't working in this version, e.g. the "+=" operator from above. If you're releasing code for others to use then this may mean that you have to field bug reports/complaints.

True may no Longer be True and False may be a Null Pointer Exception

When you implement the box methods you are potentially changing how Monkey will interpret operations on objects of that class. A good example is the use of the following as a test for an uninitialised reference:

If myInstance
DoAThing()
End

In the normal way of things, that code would only execute DoAThing() if myInstance had been initialised but not if it hadn't or if it was set to Null. However, if myInstance's class implements the ToBool method then Monkey will invoke it and attempt to unbox to test the boolean value. This will, of course, crash with a null reference exception of some kind if myInstance is uninitialised.

A similar problem can occur when doing type-testing via the casting mechanism:

If MyClass(myInstance)
DoAThing()
End

Again, if MyClass declares ToBool then this will result in an attempt to unbox after the cast. If myInstance isn't an instance of MyClass then you'll get a null reference error. The answer to both of theses is to explicitly test against Null, e.g.

If MyClass(myInstance) <> Null

That way Monkey doesn't attempt to unbox to a Bool. In fact, I'd recommend getting into the habit of typing out the explicit test as this is a potential cause of very difficult to find bugs.

One May Not Equal One

Another gotcha to be aware of around boxed primitives and logic is that the "=" comparison won't cause values to be unboxed.

Local a:IntObject = 1
Local b:IntObject = 1

If a=b
Print "This is never printed (unless Monkey starts to do interning and then it might be!)"
End

A is not equal to b in the example above because the comparison tests whether a and b are references to the same object and not whether they box the same value. To check the values you would have to unbox at least one of them yourself to make the comparison based on the integer value. Stylistically, I'd suggest explicitly unboxing both: "If a.ToInt() = b.ToInt()"

I'll end part one of this post here. Yes, there's more.

Tuesday 27 March 2012

Monkey Tip: Relative Target Performance

I've previously posted about how Android (at least on my phone) is a platform that presents performance challenges when doing cross-platform development with Monkey*. This is true, but I wanted to post something more general about cross-platform performance. So, here it is.

*This isn't specific to Monkey, of course, but the comfortable abstraction that Monkey provides means that it's more tempting to believe that all the platforms are the same than if you were writing your own cross-platform libraries.

Box2D

First, let's look at an example of an Android performance gap. Here are the timings for the Box2D domino pyramid performance test. They show the results for a run of 300 updates, ignoring all rendering and other per frame costs.

HTML5(Chrome 17)
total: 3334, low: 8, high: 18, avg: 11.07641196013289

XNA(Windows)
total: 764, low: 0, high: 16, avg: 2.538206

Android(ZTE Blade running 2.3.7)
total: 43988, low: 109, high: 295, avg: 146.13954

That's right. No joke. The phone is over 10x as slow as Javascript and nearly 60x slower than an XNA build running on my modest laptop (2GHz T4200, Intel GM45 chipset, if you feel the need to know). You really need to be careful about making any judgements about how fast your app will be on a phone if you're taking advantage of the rapid turnaround on another target.

However, that Box2D test is all about crunching numbers, constructing objects and other CPU/VM-heavy activities. For many games the rendering performance impact is going to outweigh the computational costs and so it's useful to have an idea of how things relate there.

Rendering Comparison

Here's an example for my current project. I was interested in being able to "fog out" objects to simulate distance (or, hey, actual fog). The simplest mechanism I could think of for doing this was to render a rectangle with a suitable alpha over the top of something that I wanted to be dimmed.

Before I went with that as a technique I wanted to understand what the costs might be cross the platforms, so I whipped up a test scenario and got out my measuring stick/timing class. The test was simply to render a 400x400 rectangle with a 0.1 alpha 1000 times, repeat that enough to get a decent average and, as far as possible, to separate the actual costs of the rectangle rendering.

Here are the average results for the 1000 rectangles:

Flash Player 11 (in Chrome): 1360ms
HTML5 (IE9): 425ms
HTML5 (Firefox 12): 240ms
HTML5 (Chrome 17): 215ms

XNA (Windows): 250ms
GLFW (Windows): 260ms

Android (ZTE Blade): 794ms

On the web side that result for Flash was a bit of a shocker. I'm not sure if it represents the true poor performance of Flash in this case or it's a problem with Monkey's translation. The FF and Chrome JS/Canvas engines are reasonably close together, but IE seems to be off on its own somewhere more relaxed.

The XNA and GLFW figures give a view of a "proper" desktop target. Oddly, they're marginally slower than the fastest canvas targets. I honestly don't know why that would be the case. I guess it's possible that the canvas rendering is as fast as the windowed XNA and OGL paths and maybe v-synching explains the rest, but it doesn't really add up.

As could be expected, my little Android phone is some way behind the beefier platforms but it's only 3-4x slower, which is far, far closer than in the Box2D update comparison. With these numbers I can be reasonably confident that I'm not digging a hole for myself by using a few fog rectangles around the place.

The other thing I did was to quickly test some variations of numbers and size of rectangle to see if these times could be generalised to fill-rates. I couldn't raise the enthusiasm to draw up some graphs but perhaps you'll take my word for it that they pretty much are. Render half as many rectangles? Half the time. Render twice as many at quarter of the area? Half the time. So I can, as a rule of thumb, figure that I can fill a screen's worth of my 800x480 phone with alpha rectangles in under 2ms. That's a handy thing to be able to bear in mind.

As We're Here, What About Bitmaps?

While I had the code all set up I thought I might as well check to see what the relative cost of rendering a bitmap with embedded alpha was (like a smoke texture). So here's the same test but drawing a 400x400 semi-transparent image:

Flash Player 11 (in Chrome): 475ms
HTML5 (IE9): 2970ms
HTML5 (Firefox 12): 625ms
HTML5 (Chrome 17): 550ms

XNA (Windows): 273ms
GLFW (Windows): 275ms

Android (ZTE Blade): 1590ms

The fact that Flash is considerably faster at rendering this test makes me more suspicious that the rectangle rendering is doing something odd. Worth investigating some time.

Both FF and Chrome are about 2.5 times slower at rendering bitmaps, but look at IE! That's really slow. So slow in fact that it declared the page unresponsive a few times. The world of JS engines and Canvas implementations continues to offer excitement, mystery and intriguing inconsistency to the unwary dev (and the wary ones, in fact).

The XNA and GLFW numbers barely change from the rectangle rendering. The Android time has doubled, but it's in line with the hit on HTML5, so we can still have a rough feel for what we can get away with if we're mostly building to HTML5 on Chrome or FF.

The Conclusion Bit

What I take away from this is that it pays to test the impact of a specific technique across the platforms you care about before you tie yourself to it. You can't make assumptions that relative performance between targets in one area will hold in another. In fact you can't even trust that the same target will do so in the case of HTML5. Measure twice, cut once and all that.

Why Monkey Isn't A Terrible Choice

Psst... this is a big post. I get that you're a busy person avoiding lots of important tasks by reading this blog post. If you're desperate to get on with avoiding those tasks elsewhere I'd suggest at least reading the last entry.

Monkey is a cross-platform language that I've been using for games coding for around a year now. The general idea is that you write code in Monkey and then that code is translated to a "native" language for a specific platform, so Objective C for iOS, Java for Android, AS3 for Flash, C# for XNA etc. I picked up the demo not long after release and bought a full license a little while after. In the time I've been using it I've ported two physics libraries (Fling and Box2D), released my own JSON library and I'm nearing completion on my first game based on my own framework.

For a while now I've been considering posting about Monkey (and its general development environment) but I've held off doing so. The reason being that I am, by nature, fairly critical.

I'm not as bad as this guy. He sucks at being critical.

I like to think that I'm at least as hard on myself as others but even if I am it's not necessarily much comfort to those others. Monkey has been going though some growing pains and, while I'd always give an honest opinion if asked, I didn't want to broadcast negatives about an immature product. Now though I feel that I've added a fair amount to the Monkey community to offset any negative views. Also, it has been a year and Monkey should be able to stand some criticism.

However, before I start picking holes in the product, I want to lay some positive foundations. So that's what this post is - a set of reasons to give Monkey a shot that I can point to in the future when people accuse me of being negative and spanking the Monkey, so to speak. On with the Monkey love:

Hot.

Reason 1 - Mom 'n' Pop Service

Mark Sibly at BRL

Monkey is developed by Blitz Research Ltd (BRL). It's a very small company with pretty open access. Got a problem with Monkey? You can email Mark Sibly, the programmer and el queso grande of BRL. Something wrong with the provided Monk editor? The guy who writes the editor is on the forums. If you like dealing with small companies rather than large corporations or amorphous open-source organisations then Monkey gives you that level of personal interaction.

Reason 2 - It's The New Thing, Same As The Old Thing

In a good way.

If you're a programmer who is seriously intending to create games then attaching yourself to a new technology stack is fraught with dangers. The investment in learning that stack and the time/effort value that will exist in the code you write to run on that stack is considerable. In a worst case scenario you can end up throwing away weeks or months of effort. I will admit that I was highly dubious about Monkey but there are some facts that persuaded me that it was at least worth a look:

BRL has a history of creating high-level game-friendly programming languages going back to the days of the Commodore Amiga. I'm old enough to have briefly used Blitz Basic back then in some early dabblings with graphics coding. So, the company has past form in terms of creating and maintaining these products and they've done it successfully enough to still be doing it.

The Monkey language, while bringing a bunch of new stuff to the table, is still in that same family of BASIC-ish languages. The language offers familiarity to many people and a relatively readable syntax to beginner programmers - unless you really hate BASIC, that is. I'm on the fence as to whether I hate or loathe BASICs, but it's not an unknown landscape to me and that has value when faced with so many other unknowns.

Reason 3 - It's Cheap, But Not Too Cheap

Monkey is currently 120USD for the full license. That's a one-off charge with no annual support fees or royalty cuts. Price-wise it undercuts the commercial competition by some margin (Although it also provides fewer features. I won't claim that it's a direct replacement for Unity, for example.)

On the other hand, 120USD isn't peanuts. I'm a big fan of open-source and HaxeNME is coming along nicely and that does mostly the same and is free. Surely free is better than 120 dollars? In the wallet, sure, but when there's something to be said about the value of a product that you know someone is invested in for their livelihood. Sometimes open-source projects are amazingly high quality and the teams are responsive to their users' needs and sometimes they really aren't. I don't know which one HaxeNME is or will be but you might consider that 120 dollars is a fair price to be able to at least imagine that you are understood to be a customer.

Reason 4 - It's Not a Black Box

Some cross-platform solutions are very jealous of attempts to peek at the workings or they force you to publish via server processes that they control. Monkey does none of this. If you want to get into the workings of things then you can.

Monkey coding is this manly.

The language itself is public domain and source for everything is openly provided in the distribution. The translation process spits out readable(ish) code in the target's native language and even creates IDE project files for many of them.

This means that, if you're willing and able to, you can operate directly on the native code using native tools (e.g. profilers and debuggers). You can even change the Monkey language itself if you want and are willing to carry the burden of maintaining the branch.

Reason 5 - It's Not Horrifically Slow

...considering what it is

Performance was a concern for me. Mobile devices are amazingly powerful these days but you're still dealing with some compromises and an abstracted rendering layer and translated language are compromises on top of those.

I've been pleasantly surprised though. There's no doubt that you would get more performance by working natively but Monkey's overhead isn't going to stop you from creating attractive 2D games. At some point I'll see about making some comparisons if I can find simple game-y demos to reproduce but for now I'll just offer mostly anecdotal reassurance and my one data point about my Box2D port to Monkey only being half as fast as libGDX (honestly, that's not terrible!).

Reason 6 - It's Not a Toy

But it's fun

I'll be honest. In the early days of using Monkey I really thought it was a dead-end for me. I pretty much immediately started writing framework code and converting the Physaxe library and it was obvious that nothing substantial had been written in the language during beta testing. Blocking issues with the compilation tool-chain flaking out were a regular occurrence and after working in C/C++, Python, Ruby, Java and C# the language seemed inconsistent and under-powered in a design sense.

I think I turned a corner on my plans to jump to Unity, Corona or Haxe when Monkey implemented interfaces. Initially, when asked about adding them the response was bluntly that they wouldn't happen until there was a larger rewrite and that was "at least a year" away. That was no good to me and I started looking around for options. Then, for whatever reason, interfaces were added. They were somewhat broken at first but it rescued my sense that the language could support what I wanted to do (without excessive frustration). Since then, as I mentioned at the start of the post, I've written thousands of lines of fairly structured code and nearly all of them work.

Monkey isn't a power-house of language features, "enterprise-ready" or a startlingly elegant example of language design but it is capable of supporting the creation of reasonably complex software without it collapsing into a ball of mud. It manages to avoid the trap of attempting to be easy to use at a beginner level at the expense of being able to support growth into a more expert programmer and software designer.

Reason Everything - Fear Not The Future

Because the future will be this awesome.

Monkey offers out of the box (some assembly required) push button compilation to HTML5, Flash, iOS, Android, WP7(XNA), XBox(XNA),Windows(XNA/GLFW) and MacOS(GLFW). I won't claim that it's perfect or without complications but, for the most part, you can write code once and then just compile to the platform you're interested in.

There's no doubt that this is the actual reason I picked it up in the first place and the reason I stuck with it. The other reasons listed above all together aren't as important as this one. I wanted to try creating and selling a game but I was unsure about which platforms to target. I was especially concerned with the possible need to push out a Web version as well as a mobile version and to be able to hit both iOS and Android. As a solo coder, the thought of having to create a cross-platform game in C/C++ and maybe port that to Flash/HTML5 to create a promotional web version was daunting. It would have also front-loaded a bunch of learning and costs (if I wanted to write for Android and iOS then I'd need to buy an Android device, an iPhone or iPod Touch and a Mac of some sort from the start). By going with Monkey I could push nearly all of that back and just get on with writing the game.

It gets better though. Monkey doesn't solve all of the cross-platform issues, just most of them, but for the ones it doesn't solve it provides pretty easy routes for you to solve them yourself. Writing native extensions is quite simple. Monkey also generates working projects for the platforms so you can, if you want, just open up the translated code in XCode , Eclipse or VS and add platform-specific features.

But there's more. We're in an era of platform proliferation and shifting ground. Windows 8 is around the corner with Metro-based tablets and Microsoft's own concepts of an app store. Will MS relent and allow XNA? Will HTML5 be a realistically performant target? Panic? Probably no need if you're using Monkey. When the smoke clears on the tech side (and if Metro doesn't flop) then BRL or someone else will put together a target for it and you'll recompile. No reason to panic at all.

Always carry a towel. Coding in Monkey helps too.

But what if they don't? Or what if BRL goes bust or Mr Sibly gets bored or something? Still no reason to panic. As I mentioned above, the Monkey language is actually public domain. Only the graphics, sound and input abstraction library -- mojo -- is under a commercial license and the source code for everything comes with the installation. As such it seems highly likely that a community effort could quickly be put in place to keep the language and tools going.

Even if, for the sake of argument, you decide that you absolutely must jump ship then you still have the ability to compile your Monkey code to one of the targets and use that as a base for your next step. The translated code isn't pretty, but it's munged in a predictable way and I doubt it would take more than a week or so over a project to tidy it into something perfectly reasonable.

Ultimately that's what I feel Monkey gives me: options and the ability to delay decisions with minimal cost. There's no guarantee that Monkey is the perfect choice for any given project but it is certainly one of the least binding imperfect choices you can make.

Monday 26 March 2012

Monkey vs. libGDX Box2D Performance On Android

Note: The figures in this post are now out-of-date. I posted an update here.

The game I'm working on uses Box2D, or rather a port of Box2DFlash to Monkey that I did. My game's use of Box2D isn't all that stressful but I have been curious as to exactly what sort of performance hit is entailed in using my port. In particular I've been curious about the cost on Android as that's the platform where things are tightest.

The problem with comparing these things is that it's hard to remove all uninteresting variables such as differences in the things being modelled or the costs of rendering and such. Today my curiosity got the better of me though and I downloaded the libGDX code, which compiles for Android and includes a version of Box2D accessed via JNI.

In order to compare like with like, I ported the most intensive demo from the Monkey test suite, which is a domino stack. I also fixed the update in libGDX's tests (which was oddly set to use frame delta-time, meaning that all the collisions failed if the frame rate dropped) and made the update iterations match the Monkey test standard. Finally I changed the timing code to only care about the update and to give values for the highest, lowest and average update times.

Here's the result (on my ZTE Blade running Android 2.3.7):

So we're roughly looking at a range of 65-100ms with an average of 75 ms. How does that compare to the Monkey port? The libGDX version is about twice as fast on average with the Monkey port running about 140-150ms per update. Monkey's variability is worse though and the worst case hits 300ms, so libGDX is even better in that regard.

Although that sounds like bad news, I'm less concerned now. The domino stack demo represents a scenario to be avoided. If your game relies on that sort of thing then today's mid-level mobile devices aren't really where you should be planning to release and certainly not without expecting to go native. I'm quite okay with half the speed in return for the ease of cross-platform development for my own needs.

That's not to say I'm happy to leave it as is. I doubt there's much more performance to wring out of micro-optimisations to the port but I certainly plan to revisit potential ways to remove the GC spikes. Another possibility is porting more recent code from the main Box2D implementation, but I've no idea if there have been any relevant performance improvements there.

There is one other potential source of a performance boost and that's if a native Android target is added to Monkey. I can imagine there are some headaches there with device compatibility and the like but for computationally intensive stuff like Box2D it could make a big difference.