Pointless Diversions

Friday, 1 March 2013

F2P, IAP, Micro-transactions and DLC

The news that EA is planning to put micro-transactions into their games has brought the simmering discussion surrounding IAP back to a boil. For many it represents a new offensive on the "core" gaming territory by a business model that had been mostly confined to mobile and Facebook casual.

I think gamers are right to be very suspicious of how this will play out. If this review of the EA published Real Racing 3 on iOS is anything to go by then they're willing to try putting frequent and high tariffs on gameplay. On the other hand Real Racing 3 is still a free-to-play iOS game. How they'll incorporate micro-transactions into PC or mainstream console titles has yet to be seen.

There is a strong argument that consumers have a choice to buy or not buy whatever is offered (as voiced by CliffyB while I was writing this post). I can't deny the truth in that but it doesn't mitigate the valid concerns of fans of existing games titles. EA owns many well-loved brands and fans of those brands are now faced with the possibility that future iterations will be warped to conform to a business model that needs to generate ongoing payments. It's not just a matter of the games being less fun, or even more expensive, it's the possibility that the very nature of those franchises will be changed. SimCity, Command and Conquer or Mirror's Edge wrapped round a micro-transaction model may well not be SimCity, C&C or Mirror's Edge as we know them. There won't be a choice to buy or not buy the games we want -- those games will simply not exist any more.

Maybe, Just Maybe, It'll Be Okay

There is, as always, a middle ground. It's the zone in which we can imagine games that incorporate micro-transactions in non-destructive ways. Unfortunately, a position that recognises the danger in applying free-to-play models to the traditional games market but also holds that it's not uniformly evil is too grey to be attractive. Nuanced opinions don't get heard much in a debate that exists between corporate spin-drone press-releases and hyperbolic teenaged gamers in comment sections.

Despite the lack of vocal recognition the reality is that IAP just means "stuff" you can buy from within the game. DLC is just "stuff" you can buy to add to your existing game whether from inside or outside that game. "Micro-transactions" is just code for virtual currency to buy "stuff"*. F2P just means it's free to install and play at least some of it or for some period of time and you'll have to buy "stuff" to play more.

None of these things are inherently evil concepts. My local coffee shop is F2P in that you can walk in and hang around for free. They have IAP coffee and cakes and a micro-transaction currency involving gift vouchers and reward stamps. If I stretch a bit I can even say they have DLC ground coffee that you can order for use at home. I doubt that many people would compare them to Hitler for doing any of that.

The debate needs to move from blanket "this thing good"/"this thing bad" to being about the specific behaviours. People need to express what is not okay and what is okay. Not all DLC is horse armour. Not all F2P is Real Racing 3.

If EA sells me a full-price Tomb Raider and then offers extra hats, clothes, hairstyles for more further payment then that's fine. If they want to sell me ammo or "Lara Energy" needed to play freely then it's not. I'll be honest, I probably won't buy the hats but some people will and I don't mind them being available.

If EA sells me a full-price Need For Speed and then offers extra cars, paint jobs, decals for further payment then that's fine. If they only offer the "best" car as an added extra then it's not. It's really, really not okay to do this if a game contains multiplayer competitive elements.

If EA sells me a full-price Mirror's Edge and then offers extra missions for further payment that's fine. If those missions are actually required to bring a narrative close to the story in the game I paid for then it's not.

If EA creates a franchise specifically for F2P then I reserve the right to hate it, but it's really no skin off my nose. If they take a franchise that I enjoy and subvert its nature to fit F2P then they're not just making something I dislike but they're destroying something I liked. It's not illegal but certainly a dick move.

If EA creates micro-transactions that are proxy subscription fees for ongoing services, that's fine by me. You can justify selling the ability to play if the game has a significant ongoing running cost. Justification goes out of the window if you've manufactured the need for that cost (always-connected DRM, unwanted social/multiplayer features).

If EA introduces micro-transactions to create an on-ramp between demo/piracy and full purchase then that could be acceptable. I'm okay with free to install and pay-to-play purchases for single-player games as long as I can get off the treadmill with a single purchase (infinite coins/fuel/whatever) that is in line with traditional game pricing. We also need to be able to release games from IAP treadmills in instances where the payment mechanism ceases to exist.

But Probably Not

I'm afraid that my instinct and experience tells me that we're in for a bad time no matter what arguments are made and heard. I foresee the inclusion of payment models that don't seek to increase revenue by upselling or extending the market but rather by getting people to pay more for less. I foresee micro-transactions shoe-horned into games and game designs broken to include them. I foresee a proliferation of virtual currencies as consumer "company scrip" and we'll have to collect EAbucks, Ubibeans, and Acticredits to play their games.

Ultimately it's just too tempting for publishers to ignore. F2P/IAP aren't just the kind of buzzwords that business types feel they must add to their CV. The massive numbers coming out of the top mobile freemium games are real. If you add to that the possibility that micro-transactions can be used to open a new front in the digital shop-front battle (Can't spend $EA on non-EA titles and no-one else gets a cut) then it becomes inevitable that the big boys are going to try it on.

I also frankly doubt that it's going to go away. Even if gamers vote with their wallets as CliffyB suggests and we see some big franchise releases bomb due to bad micro-transaction implementation once the systems are in place they will be used. We can only hope that market forces result in tolerable usage to most of us.

If there's a silver lining to this looming cloud it's that, by moving to payment models that are unpopular with large numbers of gamers, the big players will make space for others to offer alternatives. "Pay once, play forever" might just become a selling point in itself that indies can exploit.

*Micro-transactions were once touted as a mechanism for monetising web content and literally meant "small value transactions". The idea was that you could charge 0.5c or whatever for reading a blog post. These transactions were enabled via virtual currency (to avoid the overhead of real currency transactions). The meaning in games has taken great liberties with the interpretation of "small value" and now it really just means the use of in-game currency to buy stuff, whether for 10c or $10.

Wednesday, 27 February 2013

Game Camera Work

My on-off dev schedule has been on again recently and with it I may as well resurrect this neglected blog space.

I did a bit of work on the camera for one of my game projects and thought the progression of the solution lent itself to a post. Then I remembered that I'd been wanting to try my hand at creating one of those dev video things and so I tried that.

The creation of the video was a whole adventure in finding a workable capture solution, a working microphone, working video-editing software and getting my mouth to produce coherent sentences but I'll save you the long story.

Here's the video:

Thursday, 21 June 2012

Monkey's Garbage Collection on iOS

Not long ago I finally got around to buying the bits needed for iOS builds: a new iPod Touch, a ridiculously priced block of white plastic that runs OSX and Apple's 99 USD blessing to run my own code on the device I've already paid for. Besides the money there was a certain cost in time and frustration getting everything up-to-date and working but it is done and I can now build my project and run it on my iPod.

Well I guess this is enough to let you in the club.

I never really swallowed Monkey's promise of write-once and run pretty much anywhere so I wasn't expecting completely smooth sailing. Sure enough there have been a few wrinkles to iron out: graphics differences, an odd lack of response to touch events and performance issues - including the garbage collection (or GC for short) wrinkle that this post is about.

Some time ago someone else tested my Box2D port on their iOS device and posted on the Monkey forums that they were seeing poor frame-rates. I've not heard anything since but it stuck in my mind as a concern so I was specifically looking for performance issues when I first tried my game on my iPod. Happily the performance looked fine. I played a level with a reasonable number of entities in it and it seemed to be keeping a steady 30fps. Then I restarted the level a few times and everything got slower. A few more restarts and it was barely responding to input.

I'm Not The Only Problem Here

It was clear that I had a resource leak that was somehow holding onto game data between level restarts. What was unclear was why this was only happening on iOS or at least only causing a slow-down on iOS. An internal game bug that collected actual processing elements, such as holding on to multiple active instances of the physics simulation, shouldn't only occur on iOS so it must be something else. It seemed as good a time as any to take a look at the XCode profiler so I fired it up and re-ran my test of restarting a level several times. Not much digging was required. The game was spending increasing amounts of time in the garbage collection code marking objects.

Here's a quick explanation of that last sentence. Monkey is a garbage collected language. That means that if you create an object you don't need to explicitly free or destroy that object when you're done with it. When the reference goes out of scope or otherwise becomes unreachable by your code Monkey makes the promise that the memory it uses will be recovered.

Now several of Monkey's target languages/platforms provide their own garbage collection implementations: Java (Android), Javascript (HTML5), C# (XNA) and Flash. When building to those targets that's what your game will use. The C++ targets (GLFW and iOS) have no native GC implementation and so one is provided in the Monkey code itself. All well and good.

Garbage collectors come in many different flavours but Monkey's C++ GC is essentially a mark and sweep collector. What this means is that when the collector runs it first traverses the tree of object references in your application and "marks" those that are reachable, then it reclaims the memory or "sweeps" those that weren't marked. What the profiler was telling me was that my game was spending more and more time doing that "marking" process.

As I said, it was clear that I had a resource leak and it didn't take me long to track that down. The leak was resulting from a failure to remove one of my UI elements from a list of mouse event listeners. As the list existed for the lifetime of the game the UI element and everything attached to it remained reachable and so were never collected.

It's Already Spotless. You Can Stop Cleaning Now.

Problem solved? Only sort of. Once I'd fixed the leak I re-ran the test on my ipod and was unhappy to see that the marking process was still the biggest single chunk of time per frame. In fact I built a test that did nothing more than create and hold a bunch of objects and show the time spent in marking -- it seems that there's an approximate cost of approx 1ms per 10,000 live objects per frame on my 4th gen iPod. That's not talking about collecting objects that have been discarded, that's the standing cost of just having those objects. On every frame the Monkey code walks the tree of objects and marks them all and it takes that long to do it. Here's a graph of the average per frame cost:

Click to embiggen.

Why is this a problem? It's a problem because this behaviour is different from all the other GC implementations. Let's say you create a million map cell objects in your game and then don't allocate anything else while your game is running. If you do that on any of the other targets the GC cost will probably be zero. On iOS it would be crippling (on GFLW desktop hardware the marking process is generally so fast that it's not a problem, but it's just a question of scale, not principle).

How many objects are likely to be in a game? It totally depends but it's really not that difficult to get up there. My game is just a physics platformer with quite small levels and it runs up 20-30 thousand live objects in some situations. I could certainly do things to reduce that number, but it's kind of strange that I would have to do so just to bend to the behaviour of the garbage collector.

Trying To Be Helpful But Just Getting In The Way?

There is an argument for aggressive garbage collection and it's that you minimise the pauses seen in other garbage collectors. If you look at Android, for example, it's not unusual to see the garbage collector stop your application for a hundred milliseconds or more while collections take place. In theory, if you collect frequently, the interruptions will be smaller and more consistent and avoid pauses that are noticeable to the user.

The theory breaks down when I look at the iOS numbers, however:

Firstly, the load isn't that smooth. One frame might take a 1ms hit, another will take a 7ms hit. If you're close to the frame limit then that's easily enough variance to give you issues.

Secondly, and more importantly, the behaviour means that you can't escape the GC cost and if you're trying to do so on other platforms then you may well be making things worse on iOS. With the other GC implementations you can minimise their impact by not allocating objects during your game loop. On iOS that doesn't matter because you pay for every object you have live. What that means is that object pooling or caching strategies will benefit you on Android and actively work against you on iOS. This is not a recipe for cross-platform consistency.

What To Do?

Option one is to accept the situation and reduce your numbers of objects. If you can do that then it's not a bad thing to do in general, it just won't gain you much outside of iOS. There comes a point when avoiding using objects becomes an unreasonable expectation in an object-oriented language though.

Option two is to start performing surgery on the Monkey garbage collector. Clearly this is way over the line of abstracted cross-platform development but needs must when the devil drives and all that. An obvious change to make is simply to add a memory allocation trigger to the GC process: "Don't do anything unless x objects or x kilobytes of memory have been allocated". This would make it behave somewhat more like the standard Java collector in that you can expect strategies like object pooling to reduce garbage collector activity rather than increase it.

That memory allocation trigger could be extended all the way to "Don't do anything unless a memory allocation fails" but doing so would guarantee expensive collections and wouldn't work with the way that the GC currently attempts to spread the cost over a number of frames.

What I'll probably do is wait until the game is pretty much complete and then see if I can reduce allocations/tweak a collection trigger so that levels can be played through without a collection. Then I can add a call to force a collection to occur when the game is transitioning to the results screen or somewhere else where the player won't be bothered by a frame-rate hiccup.

If/when I settle on something I'll probably post the code for others to use. That's if the standard collector isn't altered before then. In the meantime my advice is to just be aware that the GC on iOS behaves differently to that on Android/Flash/XNA and the difference could be problematic if your game is large and/or has tight performance requirements.

Monday, 21 May 2012

Tetris In... How Long?

Yes, I know I write very long blog posts. Feel free to skim. There's a playable thingy linked below!

The Preamble

I've been following Andy Moore's blogging, vlogging, podcasting and social networking output for a fair while. Andy is the dev behind Steambirds and has recently started iOS dev with Iceburgers. He's often got interesting things to say about his life as an indie, information about the business aspects, technical tips for AS3 coders and he's also involved in a lot of face-to-face social stuff (I'm neither an AS3 coder or social but it's fun to hear about them from a safe distance). As far as I can tell, he's an all-round decent chap.

But I'm not liable if he kills you and wears your skin as pyjamas.

However, one thing he's mentioned a few times in the past has always left me kind of perplexed: doing game-jams on very short timescales. I'm not part of the game-jam scene. I've started a few 48-hour jams in the past and sort of finished one of them. I normally take a look at the Ludum Dare theme to see if I'm inspired but I haven't yet been inspired enough to spend a weekend in crunch. I can at least see that you can create a complete game in 48 hours though. Andy talks about writing a game in five minutes.

My thoughts? In a word: "bollocks". I didn't think he was lying -- he strikes me as a very honest person. I just had to assume that there was some glossing over of the reality of what comes before that five minutes and that there would be a large gap between my expectations of something I'd call "a game" and what he was talking about creating. Could it be a bit of fun? Maybe. Some sort of lesson about scope or perhaps about using short goal-oriented sprints to get past motivation issues or to test simple ideas? Perhaps.

The Actual Thing

So, anyway, Andy's been on a bit of a video spree recently. He's started a new vlogging series, which you should check out, and he's been doing a bit of live streaming. A few days ago he did a stream with a five minute dev challenge, so I got a chance to see what was up. Well, pretty much what I thought. The starting point was a template project ready to go and the end result was a moving a block with the mouse and a debug print if that block collided with some other blocks. That's no criticism of Andy as it's pretty brave to try such a thing in front of a live audience. To be honest I was relieved not to be proven wrong. Frankly it would have been awful if he'd whipped up a complete platformer or something in the time.

Harsh, but I told you he was honest.

This morning I noticed that Andy had tweeted that he was going to do another live stream; this time to build Tetris in fifteen minutes. To me this was much more interesting that just "a game" in five minutes. Tetris is a game that we're familiar with. It's not very complicated, but it's not utterly trivial either. I've never implemented it before, but it's something that I'd suggest as a project for someone just starting out with games coding and expect them to take a few days or maybe a week or so to pull something together. I'd imagine that an experienced coder could turn a basic, unpolished but technically solid version around in a day. How quickly could you just hack it together though? Surely not fifteen minutes?

I decided to see what I could do with plain out-of-the-box Monkey and then compare with Andy's run at it. Handily, he put a countdown timer on his stream, which meant that I could kick off at t-15 and see how far I got. Before I started, I quickly thought of what makes up a valid "Tetris" to me:

Grid x by y (I couldn't recall the dimensions)
All the Tetris shapes in differing colours
Random piece generation plus next piece indicator
Pieces can be moved left,right,down and can be rotated
Shape movement is locked to grid (vertical drop visual increment can be smaller, but left and right are snapped to the grid and blocks can only settle at a snapped position)
Line removal
Fail state when a block settles at the top

There are probably some bits missing that Tetris fans would insist on and maybe some parts could be argued as irrelevant to the core design on the other end but I think it's a reasonable starting point. Here's how the fifteen minutes went:

2 minutes: Creating new Jungle IDE project. Laptop is being slow so it takes a while.
4 minutes: Written the main function, app template and an outline Grid and Piece class.
5 minutes: Jungle starts flaking out. Throws error dialogs when trying to autocomplete type definitions.
7 minutes: I decide that Jungle is borked and restart it.
8 minutes: Jungle restarts and informs me there's an update. I decide to take it in case it resolves the instability I just experienced.
12 minutes: I've got the grid render written but untested
14 minutes: Base definitions for five pieces added. Not sure if the structure is good.
15 minutes: I've added left, right and drop movement and the code for instantiating a random piece from the set. and started looking at the collision. Time's up. I haven't even compiled yet.

So much for that. I watch Andy make his attempt. Andy gets something up on screen, which is more than I do. Essentially, he gets a block that follows the mouse. You can drop the block and they will stack. It's very noticeable that he pays very little attention to most of the stuff I listed above. For example, I wrote multiple shapes from the start and grid-based movement while Andy did neither. His focus was entirely on getting something on screen, not on getting Tetris on screen.

Again, this was pretty much what I expected and I'm relieved that he didn't blow me out of the water. Thumbs up from me for even trying it on a live stream. I left the stream not long after (need to preserve the bandwidth down here in NZ) but before I did there were a few suggestions for extra increments and Andy seemed to admit that he hadn't considered elements like row removal in what he had written. Clearly we had come at the problem from entirely different angles. If anything, we saw completely different problems.

I went off and toasted a bagel, made myself a coffee and thought about it a bit more. I was kind of proven right that you can't really write Tetris in fifteen minutes, but how long would it take to complete my list of functional requirements? I could afford a morning to see how far I got, so that's what I did. I got myself a little timer app and started doing micro-iterations. I stopped the timer if I got interrupted or when I got to a stage where it was worth making a note or taking a screen grab. The approach was to meet the functional requirements but to ignore any desires to pretty the code unless it helped me get where I was going faster. Here are my notes:

20 minutes: After umm-ing and ahh-ing over collision for a minute I just hacked in the first thing that came to mind. It won't handle rotation though.

22 minutes: Attempt first compile expecting to see block dropping down the screen. Of course there are compilation errors.

25 minutes: Errors cleared. I have to fix them one at a time thanks to Monkey's "something's wrong, I give up!" attitude to parsing. It runs, but I've got a nice blank screen for my troubles.

28 minutes: I had forgotten to set the update rate, which on HTML5 results in no updates at all. I'd also neglected to consider that the active piece needs to be added to the grid as it moves or be drawn separately. I implemented code to remove and re-add itself as it moves. I don't like it. Still, there's something on screen. As soon as I saw it I realised that I haven't put in detection for hitting the floor. Also, the size is clearly horribly wrong.

Awesome, isn't it?

34 minutes: I now have blocks dropping, colliding, settling and a new block coming in at the top (although it doesn't appear in the right place).

43 minutes: Added left, right and drop movement. Code starting to get hairy with ugly bounds checks and cut and paste. Although the movement works, there's a bug where I get to keep moving old pieces.

56 minutes: I've now got row removal. The code is a mess though. Also fixed the bug from the last iteration, which was caused by me failing to create copies of the base pieces so I was recycling references.

Look, one of each. That's amazing... oh, it's a bug.

58 minutes: Added a fail state. All it does is check if a spawned piece is already colliding. If it is then it prints "Failed" to the console and re-initialises the grid. Not very satisfying and probably different from Tetris proper but it matches the requirement. Rotation next. This will be quite a big chunk, I think.

1h16m: I've got rotation (one way) apparently working. I'm totally unconvinced by the code though. If I wasn't working against a ticking clock and actually intended to build a Tetris for release I'd definitely stop at this point and rethink. I've done it by changing all shape definitions to be 4x4. These are rotated clockwise and then the shape is moved to the top left of the grid again. It's quite inelegant. Also, I've just realised that it doesn't account for collisions on rotation.

1h21m: Just fixed the rotation so that it won't rotate into a colliding position. I also added the two "L" shaped blocks that I missed from the Tetris set. I just need the next piece indicator now as far as features.

1h31m: I've got the next indicator in. It's not pretty and the code to do it is a chuckle-worthy hack for re-using the existing grid code. However, we're now feature complete. Now to do a bit of testing and look at some pictures of actual Tetris to see about grid dimensions and stuff.

Fat and grey. Like creator, like creation, I guess.

1h50m: Right, a bit of research dug out the standard play area size (10x20) and the block colours. Getting the block colours involved picking them off an image from wikipedia. That required me to find my "everywhere eyedropper" tool that I rarely use and then paste all the colours into the code, correct the formatting and write the bit to reference the palette. Hence nearly twenty minutes. Also, I've found a crash bug when you lose, which I shall now have to fix...

1h55m: Fixed the crash bug, and also moved the spawn to the middle. However, I also noticed a bug where pieces weren't colliding properly on the right side. I fixed that and then realised that it now stops you rotating a piece on the right if it goes off the playfield. I'm sure in the original that it simply shifts the piece over in that scenario.

2h04m: Okay, I've added a check to see if the rotated piece is hanging off the right and, if so, it attempts to move it in. The piece still won't rotate if it's blocked by another Tetris piece on the right but I'm less sure if the original does this. As I'm also unsure of the precise row removal rules I'm going to leave it here as "job done".

The Result

I guess it's done... hey, look the next piece is a long one. Do we need multi-line bonuses?

If you want to play it, you can do so here in html5 and here built to flash. The controls are up arrow to rotate, left and right arrows for the obvious and down is a surprise... not really. If you want the code for some insane reason, it's here.

The Opinionated Bit

So, based on that, and assuming that I can at least be considered in the range of "average" as a programmer, the realistic timescale for banging out a recognisable and playable Tetris clone is about two hours. To be honest, considering I stopped the timer if I went to the loo, made a coffee or took a break and played Portal 2 for half an hour and those all allow a bit of subconscious processing I'd put it higher.

If we throw out "realistic" and talk about "possible", what then? If you discard the issues I had with my tools, then 1h45ms. If you discard my fussiness about the colours, playfield size and such then maybe 1h15m. If you're just hugely better than me and/or have written very similar code before then I'm sure you could do it in under an hour, but I think we're getting into an area where the great majority of people who try to do it are going to fail to come close. The only way I can believe you'd do it in 15 minutes is if you're pretty much just typing it from memory and that's not development as far as I'm concerned.

So, two hours for Tetris. Still quite fast, eh? Here's the thing though. While what I've got is "Tetris" it's so far away from being a releasable game that it's unreal. Firstly, the code sucks. Most of it is right on the brink of being more expensive to continue with than to rewrite. Even working on my own I'd be unable to face leaving it as is and there's no way it'd stay that way if the code was shared.

Beyond the technical debt there are still a whole bunch of missing features: there's no score or length of time played to provide achievement; no score-board to record that achievement; no difficulty curve; no music or sound effects; no menu system/options. It looks bloody awful. The movement is okay-ish but needs work. The row deletion mechanism is questionable as a modern Tetris clone and just uninteresting and unsatisfying aesthetically. I've no doubt there are bugs in there. I'd say there was at least another several days of work and tuning to get it even done enough to throw out as a freebie web game (not that I would, the world has enough Tetris clones).

Would I encourage someone to do a game jam or prototype sprint? Sure. I think it's great for people to just try building a game. I do wonder sometimes if the message gets confused between "Just have a go" and "It's easy" though. If you want to try to make a game in five minutes or fifteen minutes or even Tetris in two hours then go for it. Can you do it? Maybe, although you may have to bend the meaning of "making a game" at the short timescales. Will you do it? Probably not. Is it worth doing? Only if you enjoy it and maybe finding out if you enjoy it, even if you fail, is the real thing to take away if you're interested in games development.

Sunday, 6 May 2012

1D vs 2D Arrays, the Performance Reality

Programmers are generally seen, and like to think of themselves, as logical, rational people. Sometimes we get titles like "software engineer" that flatter us with the implication that we deal in known quantities and proven methods (at least to whatever extent that the reputation of the title "engineer" has survived its abuses).

Unfortunately, the truth is that all of us occasionally operate more on faith than fact and there are large numbers of coders who work with a cargo-cult knowledge base rather than a deep understanding of their work. It may seem odd for such an definitively digital field but human nature manages to create myth, rumour and unquestioned traditions of received knowledge even in the cold clarity of a world of ones and zeroes.

Even if you avoid the pitfalls of the incurious mind, there are plenty of errors of over-simplification and failure to differentiate context to be made. What's true in one language, on one platform, or in one framework, may not be true on another. Knowledge can be falsified by time as the underlying technology changes: a new processor, a new OS, a new language revision, an updated library.

These traps are common enough for programming in general but they're thick on the ground in the realm of performance. If you push it again to dealing with performance concerns on a cross-platform, translated programming environment like Monkey there are almost more traps than safe ground.

All of which brings me to the topic of the post, because the "fact" that 1D arrays are hugely faster than 2D arrays is one that I've seen repeated on a number of occasions in the Monkey forums. I asked in the past if there was any evidence for the claim but didn't get any reply. While I understood why it might be true in certain circumstances, it was hard to swallow as an unsupported assertion. So, when it recently raised its head again, I got the urge to see if it stood up to scrutiny.

1D? 2D? What?

I'll just quickly cover what we're talking about here. In the world of 2D games, it's very common to use some sort of equally 2D array of values to represent aspects of the game world. An obvious example would be a tiled map where the world is basically a grid of tile-type values. The straight-forward approach to this is to use an actual 2D array so that accesses are of the form:

Local array2d:Int[][]
'initialise array here
Local val:Int = array2D[row][col]

The proposed faster alternative is to use a 1D array and to do the indexing calculations yourself:

Local array1d:Int[] = New Int[ROWS*COLUMNS]
'initialise array here
Local val:Int = array1D[row*COLUMNS + col]

Why would that be faster?

Firstly, if you're iterating over an array in some manner, you can avoid some of the indexing arithmetic. Hitting every element only requires incrementing a single index and you can traverse a column by simply incrementing by the row length. This is one step removed from the standard C technique of using a pointer to traverse an array rather than a subscript index (which is also questionable depending on circumstance, but that's another story).

Secondly, it avoids the construction of multiple arrays as Monkey uses the array of arrays model for >1D. This might be cheaper to initialise, it also might be cheaper on garbage collection costs if you're throwing these arrays away. When accessing a value, it's possible that there are costs from the row indirection that are avoided by directly calculating an index. There's also a possibility that you'll get more benefit from lower-level things like predictive caching if all the data is in one consecutive lump.

Sounds convincing. Why doubt it?

Because none of that may be true for a given target or implementation. Remember that Monkey isn't a directly compiled language. It gets translated to another language and then that code is compiled for a platform target. Those targets have very different properties and optimisation paths. Making broad simplistic statements about micro-optimisation performance across all those targets seems, well... simplistic. Pushing those statements to the point where they're blanket "1D is always much faster than 2D" is even worse.

I've been doing this coding thing for a fair while now and much of it has been spent in environments where performance is a significant concern. I've worked on low-level C programming, embedded Java apps, desktop .NET applications and in high-level scripting languages. Every one has thrown up instances of people (including me) doing things based on "facts" about performance that turned out to be entirely untrue, partially untrue or just so small in effect that there was no point in doing it.

The Tests

I set up a very simple test that constructed and initialised the values of an array, performed sequential iteration over all the elements of the array and then a large number of random accesses to the array, timing each group separately. The accessed array data was just added to a total value to minimise the costs outside of the array accesses and loop structures required.

The test ran with five variations on a one million element array of Ints: 10x100,000, 100x10,000, 1,000x1,000, 10,000x100 and 100,000x10. What's that about? Well the major difference between a 2D array and a 1D array in most OO languages is that the 2D array is an array of arrays. This means that each row is another array instance, therefore there is potentially a big difference between a "wide" array versus a "tall" array (for the purposes of this post I'll talk about arrays as row x col).

So, lets take a look at some graphs shall we? Who doesn't like graphs?

Construction And Initialisation

Here are the results from the construction and initialisation part. The array instances were constructed and then the elements initialised to known values:

The graph is displaying the percentage difference between the timing for a 1D array and a 2D array. So, if the 2D array took twice as long, that would register as 100%. If it took half as long, then it shows as -50%.

There are some obvious big spikes on the Android and XNA platforms when dealing with the "tall" arrays, which is where I'd expect to see the biggest difference because of the numbers of objects being constructed. The gap rapidly narrows though and it's 20-30% for the square array. The Flash results are a bit weird but that double spike is repeatable - I don't know what's going on there - the other variants are pretty much even though. The GLFW target shows the 2D array as faster in all but the tallest case.

The HTML5 results I'm showing here are from Chrome 18. I didn't spot how decidedly screwy they were until I plotted the graph. Not screwy as in the results are wrong, just screwy as in those results are bizarre. For whatever reason, this particular test runs very slowly on the 1D array. I'll deal with the various browsers separately later. For now just take it as a clear demonstration of how actual performance can vary from expectation.

Iteration

The next test is simply iterating over the whole array and summing all the values. Again, the graph shows the percentage difference in time for the 2D operation versus the 1D.

Those Android figures show a fairly solid advantage to the 1D array. The rest though? Outside of the one spike for the tall array on GLFW it's all pretty close. For the faster targets the variation is small enough that the 1ms timing precision is a factor. I certainly wouldn't say anything other than "they're pretty much the same" for XNA, Flash and HTML5.

Random Access

The last test was to generate a list of random row and column indices and then run through them, summing the values. This sort of accessing tends to frustrate low-level optimisation techniques like predictive caching.

Now that is a pretty dataset if I do say so myself. The pattern across all the targets is clear and I think it can do without detailed commentary.

Are We There Yet?

Okay, so what do we make of the "1D arrays are much faster than 2D" now we've got the data? Well first we can see the nugget of truth in it -- if your array is tall and narrow going with 1D indexing will likely be faster. On the other hand, you could just swap your columns and rows and it's going to be a dead heat or maybe even a win for the 2D representation. For the usual case where the array is square-ish I can't see that the small wins that are there are worth the bother.

Frankly the number of situations where 1D might be worthwhile seems like it's going to be pretty small. Even on Android, where the 1D array is consistently faster across the board, I'd struggle to justify fiddling about with array index calculations. "No way!" I hear the 1D fans shout "There's a 20-50% performance boost!". Well, yes and no. What we're looking at here is a micro-benchmark that is specifically intended to put the costs of array access in the spotlight and it does that by doing very little with the data accessed. If you're doing anything more than adding integers then the picture changes quite rapidly.

That's a chart showing the Android costs for 2D versus 1D when calling a simple addition function versus a function that does just an additional multiply, Sin and a Cos operation. The earlier tests did the addition inline and you can see that just the function call overhead on the addition mutes the advantage significantly. As the function gets more computationally intensive that effect increases and our simple bit of trig reduces everything but the worst case to under 5% difference.

Whatever array access costs are in play will quickly become swamped by any real calculation work done on the data being accessed. Unless your game loop is pretty close to the perfect storm of just summing the elements of an enormous array the actual visible frames per second speed increase from going with a 1D array is likely to be non-existent.

Don't Listen To Me

So, perhaps I've convinced you that 1D is a waste of time. Don't bother. Not worth it. Right?

Wrong.

Practically, what I'm saying is that you should just leave this technique in the toolbox until you've identified a performance hot-spot where you could gain some benefit. Hopefully the data above gives you some insight into what that situation might look like. If you find such a situation then bring it out, add a little messiness to your code and enjoy the benefits of a worthwhile speed boost in exchange.

More generally I'm saying that Monkey isn't BlitzMax, or C or Java or anything else. Micro-optimisations often fail to remain valid between compiler revisions on the same OS and hardware and they're far, far less likely to be valid across all Monkey targets. Further, when people lean on such tricks to gain performance it tends to blind them to larger optimisations at the data structure and algorithm level that can make a much greater difference and which are far more likely to hold value across multiple targets.

That HTML5 Thing

Oh, yes. So those Chrome initialisation results where 2D was many times faster than 1D? Here are the results for all the tests across Chrome, FF and IE:

There you have it. One target and there are circumstances where no matter if you go with 1D or 2D you'll lose by as much on one browser as you'll gain on another. Feel free to try to make a snappy catch-all recommendation from that.

Wednesday, 25 April 2012

Monkey Tips - Default Types And Implicit Type Conversion

This one is mostly for people new to Monkey and coming from other languages. Here's some Monkey code:

Local myVarA = 1
Local myVarB = 1.5
Local myVarC = True

If you're like me when I first saw Monkey, I assumed that what I was seeing there was type inference. If I don't declare the variable type then you'll work it out for me? Awesome. So why does this not work?

Local myVarD = "Hello World"

The answer is that no type inference is occurring. If you don't declare a variable's type then Monkey will default that type to be an integer. Further, Monkey performs implicit type conversion between some primitives. Floats can be converted to Ints and vice versa and Bools can be converted to Ints (but not Ints to Bools). You don't even get a warning that this is happening and the first three examples above all compile happily with you potentially none the wiser that you've just declared three Ints with a value of 1.

Now some people might consider that sort of thing convenient. For me, that the float and boolean to int conversions just go by without Monkey even raising an eyebrow is convenient like leaving your roller skates at the top of the stairs so you don't have to go rummaging in the wardrobe. You absolutely will end up accidentally lopping off your float precision at some point.

There is a partial answer to this danger and that's to switch on "strict mode" by adding the Strict keyword to the top of your file. This mode will raise an error if you don't declare a type - although it has to be said that it's not a very good error: "Illegal type expression" rather than something a little more descriptive like "Type is undefined". Strict mode doesn't do anything about implicit type conversion though and it still happens silently in literal assignments or calculations.

I should mention that Monkey does offer type inference, but you have to explicitly invoke it via the ":=" operator:

Local thisIsAFloat := 1.5

<sarc>Which, I'm sure you'll agree, is something you'd never forget or mess up. Even if you did miss the colon, you couldn't fail to spot such a glaring error, right?</sarc> Seriously, just use Strict.

Sunday, 1 April 2012

Monkey Tips - Boxes Caveats Part 2

This is a continuation from Part 1 of my discussion of primitive type boxing in Monkey. In this part I'll get into issues that primarily concern expectations if you're coming from Java or a language that handles auto-boxing in a manner similar to Java.

We're Not Writing Java Any More, Toto

One thing to be aware of is that Monkey's base box classes are not as integrated into the language as Java's implementation. In Java, the compiler will perform boxing in situations where Monkey will not. If I have a Java method that takes an Object reference:

public void add( Object value ){...}

Then Java allows me to call it with a boxable primitive value: add(10) or add(true), and the compiler will automatically box to an Integer object or a Boolean object respectively. Monkey won't do this. However, if, in Monkey, you declare a method that takes a box object reference:

Method Add:Void( value:IntObject )

Then you can call it with primitive values that are boxable by that type: Add(10).

Hey Toto, Is a Scuba Diver a Mammal or a Fish?

This one is more of a gotcha on a design level than usage. Let's say I have a generic container class that can store primitives or Object references and offers an overloaded Add method to do this:

Class MyContainer
Method Add:Void( value:Int )
Method Add:Void( value:Float )
Method Add:Void( value:String )
Method Add:Void( value:Bool )
Method Add:Void( value:Object )
End

Now this works fine, until you try to store an object that implements an unboxing method. If you do that, you get an error: "Unable to determine overload to use". As far as Monkey is concerned the thing you are passing can be an Object or it can be one of the primitives. In order to successfully store the box object you have to explicitly cast the reference: myContainer.Add( Object(myBox) ).

This might seem fair enough but it is somewhat odds with how things work in other OO languages. In Java, the rules are that if a match can be made without boxing or unboxing then that match is used and in this case the boxed object is a descendant of Object, so that's a clear match.

There's No Place Like Home?

Why doesn't Monkey do the same? Well, when I raised this problem on the Monkey forums the term "exact match" was used, implying that the box class isn't an exact match for an Object so the overload can't be resolved. I'd argue that this is inconsistent with basic OO concepts. In OO, descendants have to be substitutable for their parent classes. In other words you have to be able to use a descendant class anywhere you can use the base class and so they are matches by definition. Requiring that you cast a reference to be an Object reference is a nonsense as all references must be Object references (because that's the root object type and everything descends from it).

I'd also argue that this behaviour is inconsistent with Monkey's own behaviour elsewhere. The "=" comparison mentioned in the first boxing post is an example of this. In that case our box objects could arguably be considered to match a primitive or an Object reference but Monkey clearly chooses to prefer to compare them as Objects, which makes sense because that's actually what they are.

Needless to say I'm not a fan of this language design choice. It may be that the majority of Monkey programmers will never come across the issue, or maybe they won't see it as a problem if they do, but it has certainly caused me some headaches. It forced me to avoid using an overloaded primitive/Object API for my JSON library with the result that the API is not as clean and consistent as I'd like. No, it's not insurmountable, but it's the kind of rough edge that can turn developers off a language. Being different is fine, but being arbitrarily different in ways that don't seem very helpful isn't.