Showing posts with label Android. Show all posts
Showing posts with label Android. Show all posts

Friday, 8 March 2013

Update On Box2D Performance vs. libGDX


A while back I posted a quick performance comparison between my Monkey port of Box2D and libGDX's JNI version. At the time the libGDX version was about twice as fast.

Since then I've done a bit of optimisation work and I thought it would be worth posting to note that I've just uploaded 1.0.12 of the port and to update the figures.

The Monkey performance on the same domino stack benchmark from the previous post is now ~88ms on average. Assuming that libGDX's Box2D implementation remains the same at ~75ms that means the performance gap has narrowed from 100% to under 20%. The variability of performance with the Monkey version is still higher, with spikes to >160ms where libGDX keeps it under 100ms but it's a much closer race now. Note that this is just a headline figure on a very collision/contact intensive example that is intentionally chosen for being slow on Android, performance in general is likely indistinguishable for the most part.

The performance improvements are significant enough to enable Monkey devs to target lower powered devices and/or create more physics intensive games. So, if you shelved something because the framerate wasn't quite there then it might be worth grabbing the newer module version and taking another look.

Tuesday, 27 March 2012

Monkey Tip: Relative Target Performance

I've previously posted about how Android (at least on my phone) is a platform that presents performance challenges when doing cross-platform development with Monkey*. This is true, but I wanted to post something more general about cross-platform performance. So, here it is.

 *This isn't specific to Monkey, of course, but the comfortable abstraction that Monkey provides means that it's more tempting to believe that all the platforms are the same than if you were writing your own cross-platform libraries.

Box2D


First, let's look at an example of an Android performance gap. Here are the timings for the Box2D domino pyramid performance test. They show the results for a run of 300 updates, ignoring all rendering and other per frame costs.

HTML5(Chrome 17)
total: 3334, low: 8, high: 18, avg: 11.07641196013289

XNA(Windows)
total: 764, low: 0, high: 16, avg: 2.538206

Android(ZTE Blade running 2.3.7)
total: 43988, low: 109, high: 295, avg: 146.13954

That's right. No joke. The phone is over 10x as slow as Javascript and nearly 60x slower than an XNA build running on my modest laptop (2GHz T4200, Intel GM45 chipset, if you feel the need to know). You really need to be careful about making any judgements about how fast your app will be on a phone if you're taking advantage of the rapid turnaround on another target.

However, that Box2D test is all about crunching numbers, constructing objects and other CPU/VM-heavy activities. For many games the rendering performance impact is going to outweigh the computational costs and so it's useful to have an idea of how things relate there.

Rendering Comparison


Here's an example for my current project. I was interested in being able to "fog out" objects to simulate distance (or, hey, actual fog). The simplest mechanism I could think of for doing this was to render a rectangle with a suitable alpha over the top of something that I wanted to be dimmed.

Before I went with that as a technique I wanted to understand what the costs might be cross the platforms, so I whipped up a test scenario and got out my measuring stick/timing class. The test was simply to render a 400x400 rectangle with a 0.1 alpha 1000 times, repeat that enough to get a decent average and, as far as possible, to separate the actual costs of the rectangle rendering.

Here are the average results for the 1000 rectangles:

Flash Player 11 (in Chrome): 1360ms
HTML5 (IE9): 425ms
HTML5 (Firefox 12): 240ms
HTML5 (Chrome 17): 215ms

XNA (Windows): 250ms
GLFW (Windows): 260ms

Android (ZTE Blade): 794ms

On the web side that result for Flash was a bit of a shocker. I'm not sure if it represents the true poor performance of Flash in this case or it's a problem with Monkey's translation. The FF and Chrome JS/Canvas engines are reasonably close together, but IE seems to be off on its own somewhere more relaxed.

The XNA and GLFW figures give a view of a "proper" desktop target. Oddly, they're marginally slower than the fastest canvas targets. I honestly don't know why that would be the case. I guess it's possible that the canvas rendering is as fast as the windowed XNA and OGL paths and maybe v-synching explains the rest, but it doesn't really add up.

As could be expected, my little Android phone is some way behind the beefier platforms but it's only 3-4x slower, which is far, far closer than in the Box2D update comparison. With these numbers I can be reasonably confident that I'm not digging a hole for myself by using a few fog rectangles around the place.

The other thing I did was to quickly test some variations of numbers and size of rectangle to see if these times could be generalised to fill-rates. I couldn't raise the enthusiasm to draw up some graphs but perhaps you'll take my word for it that they pretty much are. Render half as many rectangles? Half the time. Render twice as many at quarter of the area? Half the time. So I can, as a rule of thumb, figure that I can fill a screen's worth of my 800x480 phone with alpha rectangles in under 2ms. That's a handy thing to be able to bear in mind.

As We're Here, What About Bitmaps?


While I had the code all set up I thought I might as well check to see what the relative cost of rendering a bitmap with embedded alpha was (like a smoke texture). So here's the same test but drawing a 400x400 semi-transparent image:

Flash Player 11 (in Chrome): 475ms
HTML5 (IE9): 2970ms
HTML5 (Firefox 12): 625ms
HTML5 (Chrome 17): 550ms

XNA (Windows): 273ms
GLFW (Windows): 275ms

Android (ZTE Blade): 1590ms

The fact that Flash is considerably faster at rendering this test makes me more suspicious that the rectangle rendering is doing something odd. Worth investigating some time.

Both FF and Chrome are about 2.5 times slower at rendering bitmaps, but look at IE! That's really slow. So slow in fact that it declared the page unresponsive a few times. The world of JS engines and Canvas implementations continues to offer excitement, mystery and intriguing inconsistency to the unwary dev (and the wary ones, in fact).

The XNA and GLFW numbers barely change from the rectangle rendering. The Android time has doubled, but it's in line with the hit on HTML5, so we can still have a rough feel for what we can get away with if we're mostly building to HTML5 on Chrome or FF.

The Conclusion Bit


What I take away from this is that it pays to test the impact of a specific technique across the platforms you care about before you tie yourself to it. You can't make assumptions that relative performance between targets in one area will hold in another. In fact you can't even trust that the same target will do so in the case of HTML5. Measure twice, cut once and all that.

Monday, 26 March 2012

Monkey vs. libGDX Box2D Performance On Android

Note: The figures in this post are now out-of-date. I posted an update here.

The game I'm working on uses Box2D, or rather a port of Box2DFlash  to Monkey that I did. My game's use of Box2D isn't all that stressful but I have been curious as to exactly what sort of performance hit is entailed in using my port. In particular I've been curious about the cost on Android as that's the platform where things are tightest.

The problem with comparing these things is that it's hard to remove all uninteresting variables such as differences in the things being modelled or the costs of rendering and such. Today my curiosity got the better of me though and I downloaded the libGDX code, which compiles for Android and includes a version of Box2D accessed via JNI.

In order to compare like with like, I ported the most intensive demo from the Monkey test suite, which is a domino stack. I also fixed the update in libGDX's tests (which was oddly set to use frame delta-time, meaning that all the collisions failed if the frame rate dropped) and made the update iterations match the Monkey test standard. Finally I changed the timing code to only care about the update and to give values for the highest, lowest and average update times.

Here's the result (on my ZTE Blade running Android 2.3.7):



So we're roughly looking at a range of 65-100ms with an average of 75 ms. How does that compare to the Monkey port? The libGDX version is about twice as fast on average with the Monkey port running about 140-150ms per update. Monkey's variability is worse though and the worst case hits 300ms, so libGDX is even better in that regard.

Although that sounds like bad news, I'm less concerned now. The domino stack demo represents a scenario to be avoided. If your game relies on that sort of thing then today's mid-level mobile devices aren't really where you should be planning to release and certainly not without expecting to go native. I'm quite okay with half the speed in return for the ease of cross-platform development for my own needs.

That's not to say I'm happy to leave it as is. I doubt there's much more performance to wring out of micro-optimisations to the port but I certainly plan to revisit potential ways to remove the GC spikes. Another possibility is porting more recent code from the main Box2D implementation, but I've no idea if there have been any relevant performance improvements there.

There is one other potential source of a performance boost and that's if a native Android target is added to Monkey. I can imagine there are some headaches there with device compatibility and the like but for computationally intensive stuff like Box2D it could make a big difference.