Monday 21 May 2012

Tetris In... How Long?

Yes, I know I write very long blog posts. Feel free to skim. There's a playable thingy linked below!

The Preamble


I've been following Andy Moore's blogging, vlogging, podcasting and social networking output for a fair while. Andy is the dev behind Steambirds and has recently started iOS dev with Iceburgers. He's often got interesting things to say about his life as an indie, information about the business aspects, technical tips for AS3 coders and he's also involved in a lot of face-to-face social stuff (I'm neither an AS3 coder or social but it's fun to hear about them from a safe distance). As far as I can tell, he's an all-round decent chap.

But I'm not liable if he kills you and wears your skin as pyjamas.
However, one thing he's mentioned a few times in the past has always left me kind of perplexed: doing game-jams on very short timescales. I'm not part of the game-jam scene. I've started a few 48-hour jams in the past and sort of finished one of them. I normally take a look at the Ludum Dare theme to see if I'm inspired but I haven't yet been inspired enough to spend a weekend in crunch. I can at least see that you can create a complete game in 48 hours though. Andy talks about writing a game in five minutes.

My thoughts? In a word: "bollocks". I didn't think he was lying -- he strikes me as a very honest person. I just had to assume that there was some glossing over of the reality of what comes before that five minutes and that there would be a large gap between my expectations of something I'd call "a game" and what he was talking about creating. Could it be a bit of fun? Maybe. Some sort of lesson about scope or perhaps about using short goal-oriented sprints to get past motivation issues or to test simple ideas? Perhaps.

The Actual Thing


So, anyway, Andy's been on a bit of a video spree recently. He's started a new vlogging series, which you should check out, and he's been doing a bit of live streaming. A few days ago he did a stream with a five minute dev challenge, so I got a chance to see what was up. Well, pretty much what I thought. The starting point was a template project ready to go and the end result was a moving a block with the mouse and a debug print if that block collided with some other blocks. That's no criticism of Andy as it's pretty brave to try such a thing in front of a live audience. To be honest I was relieved not to be proven wrong. Frankly it would have been awful if he'd whipped up a complete platformer or something in the time.

Harsh, but I told you he was honest.
This morning I noticed that Andy had tweeted that he was going to do another live stream; this time to build Tetris in fifteen minutes. To me this was much more interesting that just "a game" in five minutes. Tetris is a game that we're familiar with. It's not very complicated, but it's not utterly trivial either. I've never implemented it before, but it's something that I'd suggest as a project for someone just starting out with games coding and expect them to take a few days or maybe a week or so to pull something together. I'd imagine that an experienced coder could turn a basic, unpolished but technically solid version around in a day. How quickly could you just hack it together though? Surely not fifteen minutes?

I decided to see what I could do with plain out-of-the-box Monkey and then compare with Andy's run at it. Handily, he put a countdown timer on his stream, which meant that I could kick off at t-15 and see how far I got. Before I started, I quickly thought of what makes up a valid "Tetris" to me:
  • Grid x by y (I couldn't recall the dimensions)
  • All the Tetris shapes in differing colours
  • Random piece generation plus next piece indicator
  • Pieces can be moved left,right,down and can be rotated
  • Shape movement is locked to grid (vertical drop visual increment can be smaller, but left and right are snapped to the grid and blocks can only settle at a snapped position) 
  • Line removal
  • Fail state when a block settles at the top

There are probably some bits missing that Tetris fans would insist on and maybe some parts could be argued as irrelevant to the core design on the other end but I think it's a reasonable starting point. Here's how the fifteen minutes went:

2 minutes: Creating new Jungle IDE project. Laptop is being slow so it takes a while.
4 minutes: Written the main function, app template and an outline Grid and Piece class.
5 minutes: Jungle starts flaking out. Throws error dialogs when trying to autocomplete type definitions.
7 minutes: I decide that Jungle is borked and restart it.
8 minutes: Jungle restarts and informs me there's an update. I decide to take it in case it resolves the instability I just experienced.
12 minutes: I've got the grid render written but untested
14 minutes: Base definitions for five pieces added. Not sure if the structure is good.
15 minutes: I've added left, right and drop movement and the code for instantiating a random piece from the set. and started looking at the collision. Time's up. I haven't even compiled yet.

So much for that. I watch Andy make his attempt. Andy gets something up on screen, which is more than I do. Essentially, he gets a block that follows the mouse. You can drop the block and they will stack. It's very noticeable that he pays very little attention to most of the stuff I listed above. For example, I wrote multiple shapes from the start and grid-based movement while Andy did neither. His focus was entirely on getting something on screen, not on getting Tetris on screen.

Again, this was pretty much what I expected and I'm relieved that he didn't blow me out of the water. Thumbs up from me for even trying it on a live stream. I left the stream not long after (need to preserve the bandwidth down here in NZ) but before I did there were a few suggestions for extra increments and Andy seemed to admit that he hadn't considered elements like row removal in what he had written. Clearly we had come at the problem from entirely different angles. If anything, we saw completely different problems.

I went off and toasted a bagel, made myself a coffee and thought about it a bit more. I was kind of proven right that you can't really write Tetris in fifteen minutes, but how long would it take to complete my list of functional requirements? I could afford a morning to see how far I got, so that's what I did. I got myself a little timer app and started doing micro-iterations. I stopped the timer if I got interrupted or when I got to a stage where it was worth making a note or taking a screen grab. The approach was to meet the functional requirements but to ignore any desires to pretty the code unless it helped me get where I was going faster. Here are my notes:

20 minutes: After umm-ing and ahh-ing over collision for a minute I just hacked in the first thing that came to mind. It won't handle rotation though.

22 minutes: Attempt first compile expecting to see block dropping down the screen. Of course there are compilation errors.

25 minutes: Errors cleared. I have to fix them one at a time thanks to Monkey's "something's wrong, I give up!" attitude to parsing. It runs, but I've got a nice blank screen for my troubles.

28 minutes: I had forgotten to set the update rate, which on HTML5 results in no updates at all. I'd also neglected to consider that the active piece needs to be added to the grid as it moves or be drawn separately. I implemented code to remove and re-add itself as it moves. I don't like it. Still, there's something on screen. As soon as I saw it I realised that I haven't put in detection for hitting the floor. Also, the size is clearly horribly wrong.

Awesome, isn't it?

34 minutes: I now have blocks dropping, colliding, settling and a new block coming in at the top (although it doesn't appear in the right place).

43 minutes: Added left, right and drop movement. Code starting to get hairy with ugly bounds checks and cut and paste. Although the movement works, there's a bug where I get to keep moving old pieces.

56 minutes: I've now got row removal. The code is a mess though. Also fixed the bug from the last iteration, which was caused by me failing to create copies of the base pieces so I was recycling references.

Look, one of each. That's amazing... oh, it's a bug.

58 minutes: Added a fail state. All it does is check if a spawned piece is already colliding. If it is then it prints "Failed" to the console and re-initialises the grid. Not very satisfying and probably different from Tetris proper but it matches the requirement. Rotation next. This will be quite a big chunk, I think.

1h16m: I've got rotation (one way) apparently working. I'm totally unconvinced by the code though. If I wasn't working against a ticking clock and actually intended to build a Tetris for release I'd definitely stop at this point and rethink. I've done it by changing all shape definitions to be 4x4. These are rotated clockwise and then the shape is moved to the top left of the grid again. It's quite inelegant. Also, I've just realised that it doesn't account for collisions on rotation.

1h21m: Just fixed the rotation so that it won't rotate into a colliding position. I also added the two "L" shaped blocks that I missed from the Tetris set. I just need the next piece indicator now as far as features.

1h31m: I've got the next indicator in. It's not pretty and the code to do it is a chuckle-worthy hack for re-using the existing grid code. However, we're now feature complete. Now to do a bit of testing and look at some pictures of actual Tetris to see about grid dimensions and stuff.

Fat and grey. Like creator, like creation, I guess.
1h50m: Right, a bit of research dug out the standard play area size (10x20) and the block colours. Getting the block colours involved picking them off an image from wikipedia. That required me to find my "everywhere eyedropper" tool that I rarely use and then paste all the colours into the code, correct the formatting and write the bit to reference the palette. Hence nearly twenty minutes. Also, I've found a crash bug when you lose, which I shall now have to fix...

1h55m: Fixed the crash bug, and also moved the spawn to the middle. However, I also noticed a bug where pieces weren't colliding properly on the right side. I fixed that and then realised that it now stops you rotating a piece on the right if it goes off the playfield. I'm sure in the original that it simply shifts the piece over in that scenario.

2h04m: Okay, I've added a check to see if the rotated piece is hanging off the right and, if so, it attempts to move it in. The piece still won't rotate if it's blocked by another Tetris piece on the right but I'm less sure if the original does this. As I'm also unsure of the precise row removal rules I'm going to leave it here as "job done".

The Result

I guess it's done... hey, look the next piece is a long one. Do we need multi-line bonuses?
If you want to play it, you can do so here in html5 and here built to flash. The controls are up arrow to rotate, left and right arrows for the obvious and down is a surprise... not really. If you want the code for some insane reason, it's here.

The Opinionated Bit


So, based on that, and assuming that I can at least be considered in the range of "average" as a programmer, the realistic timescale for banging out a recognisable and playable Tetris clone is about two hours. To be honest, considering I stopped the timer if I went to the loo, made a coffee or took a break and played Portal 2 for half an hour and those all allow a bit of subconscious processing I'd put it higher.

If we throw out "realistic" and talk about "possible", what then? If you discard the issues I had with my tools, then 1h45ms. If you discard my fussiness about the colours, playfield size and such then maybe 1h15m. If you're just hugely better than me and/or have written very similar code before then I'm sure you could do it in under an hour, but I think we're getting into an area where the great majority of people who try to do it are going to fail to come close. The only way I can believe you'd do it in 15 minutes is if you're pretty much just typing it from memory and that's not development as far as I'm concerned.

So, two hours for Tetris. Still quite fast, eh? Here's the thing though. While what I've got is "Tetris" it's so far away from being a releasable game that it's unreal. Firstly, the code sucks. Most of it is right on the brink of being more expensive to continue with than to rewrite. Even working on my own I'd be unable to face leaving it as is and there's no way it'd stay that way if the code was shared.

Beyond the technical debt there are still a whole bunch of missing features: there's no score or length of time played to provide achievement; no score-board to record that achievement; no difficulty curve; no music or sound effects; no menu system/options. It looks bloody awful. The movement is okay-ish but needs work. The row deletion mechanism is questionable as a modern Tetris clone and just uninteresting and unsatisfying aesthetically. I've no doubt there are bugs in there. I'd say there was at least another several days of work and tuning to get it even done enough to throw out as a freebie web game (not that I would, the world has enough Tetris clones).

Would I encourage someone to do a game jam or prototype sprint? Sure. I think it's great for people to just try building a game. I do wonder sometimes if the message gets confused between "Just have a go" and "It's easy" though. If you want to try to make a game in five minutes or fifteen minutes or even Tetris in two hours then go for it. Can you do it? Maybe, although you may have to bend the meaning of "making a game" at the short timescales. Will you do it? Probably not. Is it worth doing? Only if you enjoy it and maybe finding out if you enjoy it, even if you fail, is the real thing to take away if you're interested in games development.



Sunday 6 May 2012

1D vs 2D Arrays, the Performance Reality

Programmers are generally seen, and like to think of themselves, as logical, rational people. Sometimes we get titles like "software engineer" that flatter us with the implication that we deal in known quantities and proven methods (at least to whatever extent that the reputation of the title "engineer" has survived its abuses).

Unfortunately, the truth is that all of us occasionally operate more on faith than fact and there are large numbers of coders who work with a cargo-cult knowledge base rather than a deep understanding of their work. It may seem odd for such an definitively digital field but human nature manages to create myth, rumour and unquestioned traditions of received knowledge even in the cold clarity of a world of ones and zeroes.

Even if you avoid the pitfalls of the incurious mind, there are plenty of errors of over-simplification and failure to differentiate context to be made. What's true in one language, on one platform, or in one framework, may not be true on another. Knowledge can be falsified by time as the underlying technology changes: a new processor, a new OS, a new language revision, an updated library.

These traps are common enough for programming in general but they're thick on the ground in the realm of performance. If you push it again to dealing with performance concerns on a cross-platform, translated programming environment like Monkey there are almost more traps than safe ground.

All of which brings me to the topic of the post, because the "fact" that 1D arrays are hugely faster than 2D arrays is one that I've seen repeated on a number of occasions in the Monkey forums. I asked in the past if there was any evidence for the claim but didn't get any reply. While I understood why it might be true in certain circumstances, it was hard to swallow as an unsupported assertion. So, when it recently raised its head again, I got the urge to see if it stood up to scrutiny.

1D? 2D? What?


I'll just quickly cover what we're talking about here. In the world of 2D games, it's very common to use some sort of equally 2D array of values to represent aspects of the game world. An obvious example would be a tiled map where the world is basically a grid of tile-type values. The straight-forward approach to this is to use an actual 2D array so that accesses are of the form:

Local array2d:Int[][]
'initialise array here
Local val:Int = array2D[row][col]

The proposed faster alternative is to use a 1D array and to do the indexing calculations yourself:

Local array1d:Int[] = New Int[ROWS*COLUMNS]
'initialise array here
Local val:Int = array1D[row*COLUMNS + col]

Why would that be faster?


Firstly, if you're iterating over an array in some manner, you can avoid some of the indexing arithmetic. Hitting every element only requires incrementing a single index and you can traverse a column by simply incrementing by the row length. This is one step removed from the standard C technique of using a pointer to traverse an array rather than a subscript index (which is also questionable depending on circumstance, but that's another story).

Secondly, it avoids the construction of multiple arrays as Monkey uses the array of arrays model for >1D. This might be cheaper to initialise, it also might be cheaper on garbage collection costs if you're throwing these arrays away. When accessing a value, it's possible that there are costs from the row indirection that are avoided by directly calculating an index. There's also a possibility that you'll get more benefit from lower-level things like predictive caching if all the data is in one consecutive lump.

Sounds convincing. Why doubt it?


Because none of that may be true for a given target or implementation. Remember that Monkey isn't a directly compiled language. It gets translated to another language and then that code is compiled for a platform target. Those targets have very different properties and optimisation paths. Making broad simplistic statements about micro-optimisation performance across all those targets seems, well... simplistic. Pushing those statements to the point where they're blanket "1D is always much faster than 2D" is even worse.

I've been doing this coding thing for a fair while now and much of it has been spent in environments where performance is a significant concern. I've worked on low-level C programming, embedded Java apps, desktop .NET applications and in high-level scripting languages. Every one has thrown up instances of people (including me) doing things based on "facts" about performance that turned out to be entirely untrue, partially untrue or just so small in effect that there was no point in doing it.

The Tests


I set up a very simple test that constructed and initialised the values of an array, performed sequential iteration over all the elements of the array and then a large number of random accesses to the array, timing each group separately. The accessed array data was just added to a total value to minimise the costs outside of the array accesses and loop structures required.

The test ran with five variations on a one million element array of Ints: 10x100,000, 100x10,000, 1,000x1,000, 10,000x100 and 100,000x10. What's that about? Well the major difference between a 2D array and a 1D array in most OO languages is that the 2D array is an array of arrays. This means that each row is another array instance, therefore there is potentially a big difference between a "wide" array versus a "tall" array (for the purposes of this post I'll talk about arrays as row x col).

So, lets take a look at some graphs shall we? Who doesn't like graphs?


Construction And Initialisation


Here are the results from the construction and initialisation part. The array instances were constructed and then the elements initialised to known values:

The graph is displaying the percentage difference between the timing for a 1D array and a 2D array. So, if the 2D array took twice as long, that would register as 100%. If it took half as long, then it shows as -50%.

There are some obvious big spikes on the Android and XNA platforms when dealing with the "tall" arrays, which is where I'd expect to see the biggest difference because of the numbers of objects being constructed. The gap rapidly narrows though and it's 20-30% for the square array. The Flash results are a bit weird but that double spike is repeatable - I don't know what's going on there - the other variants are pretty much even though. The GLFW target shows the 2D array as faster in all but the tallest case.

The HTML5 results I'm showing here are from Chrome 18. I didn't spot how decidedly screwy they were until I plotted the graph. Not screwy as in the results are wrong, just screwy as in those results are bizarre. For whatever reason, this particular test runs very slowly on the 1D array. I'll deal with the various browsers separately later. For now just take it as a clear demonstration of how actual performance can vary from expectation.

Iteration


The next test is simply iterating over the whole array and summing all the values. Again, the graph shows the percentage difference in time for the 2D operation versus the 1D.


Those Android figures show a fairly solid advantage to the 1D array. The rest though? Outside of the one spike for the tall array on GLFW it's all pretty close. For the faster targets the variation is small enough that the 1ms timing precision is a factor. I certainly wouldn't say anything other than "they're pretty much the same" for XNA, Flash and HTML5.

Random Access


The last test was to generate a list of random row and column indices and then run through them, summing the values. This sort of accessing tends to frustrate low-level optimisation techniques like predictive caching.


Now that is a pretty dataset if I do say so myself. The pattern across all the targets is clear and I think it can do without detailed commentary.

Are We There Yet?


Okay, so what do we make of the "1D arrays are much faster than 2D" now we've got the data? Well first we can see the nugget of truth in it -- if your array is tall and narrow going with 1D indexing will likely be faster. On the other hand, you could just swap your columns and rows and it's going to be a dead heat or maybe even a win for the 2D representation. For the usual case where the array is square-ish I can't see that the small wins that are there are worth the bother.

Frankly the number of situations where 1D might be worthwhile seems like it's going to be pretty small. Even on Android, where the 1D array is consistently faster across the board, I'd struggle to justify fiddling about with array index calculations. "No way!" I hear the 1D fans shout "There's a 20-50% performance boost!". Well, yes and no. What we're looking at here is a micro-benchmark that is specifically intended to put the costs of array access in the spotlight and it does that by doing very little with the data accessed. If you're doing anything more than adding integers then the picture changes quite rapidly.


That's a chart showing the Android costs for 2D versus 1D when calling a simple addition function versus a function that does just an additional multiply, Sin and a Cos operation. The earlier tests did the addition inline and you can see that just the function call overhead on the addition mutes the advantage significantly. As the function gets more computationally intensive that effect increases and our simple bit of trig reduces everything but the worst case to under 5% difference.

Whatever array access costs are in play will quickly become swamped by any real calculation work done on the data being accessed. Unless your game loop is pretty close to the perfect storm of just summing the elements of an enormous array the actual visible frames per second speed increase from going with a 1D array is likely to be non-existent.

Don't Listen To Me


So, perhaps I've convinced you that 1D is a waste of time. Don't bother. Not worth it. Right?

Wrong.

Practically, what I'm saying is that you should just leave this technique in the toolbox until you've identified a performance hot-spot where you could gain some benefit. Hopefully the data above gives you some insight into what that situation might look like. If you find such a situation then bring it out, add a little messiness to your code and enjoy the benefits of a worthwhile speed boost in exchange.

More generally I'm saying that Monkey isn't BlitzMax, or C or Java or anything else. Micro-optimisations often fail to remain valid between compiler revisions on the same OS and hardware and they're far, far less likely to be valid across all Monkey targets. Further, when people lean on such tricks to gain performance it tends to blind them to larger optimisations at the data structure and algorithm level that can make a much greater difference and which are far more likely to hold value across multiple targets.

That HTML5 Thing


Oh, yes. So those Chrome initialisation results where 2D was many times faster than 1D? Here are the results for all the tests across Chrome, FF and IE:


There you have it. One target and there are circumstances where no matter if you go with 1D or 2D you'll lose by as much on one browser as you'll gain on another. Feel free to try to make a snappy catch-all recommendation from that.