Sunday, 6 May 2012

1D vs 2D Arrays, the Performance Reality

Programmers are generally seen, and like to think of themselves, as logical, rational people. Sometimes we get titles like "software engineer" that flatter us with the implication that we deal in known quantities and proven methods (at least to whatever extent that the reputation of the title "engineer" has survived its abuses).

Unfortunately, the truth is that all of us occasionally operate more on faith than fact and there are large numbers of coders who work with a cargo-cult knowledge base rather than a deep understanding of their work. It may seem odd for such an definitively digital field but human nature manages to create myth, rumour and unquestioned traditions of received knowledge even in the cold clarity of a world of ones and zeroes.

Even if you avoid the pitfalls of the incurious mind, there are plenty of errors of over-simplification and failure to differentiate context to be made. What's true in one language, on one platform, or in one framework, may not be true on another. Knowledge can be falsified by time as the underlying technology changes: a new processor, a new OS, a new language revision, an updated library.

These traps are common enough for programming in general but they're thick on the ground in the realm of performance. If you push it again to dealing with performance concerns on a cross-platform, translated programming environment like Monkey there are almost more traps than safe ground.

All of which brings me to the topic of the post, because the "fact" that 1D arrays are hugely faster than 2D arrays is one that I've seen repeated on a number of occasions in the Monkey forums. I asked in the past if there was any evidence for the claim but didn't get any reply. While I understood why it might be true in certain circumstances, it was hard to swallow as an unsupported assertion. So, when it recently raised its head again, I got the urge to see if it stood up to scrutiny.

1D? 2D? What?


I'll just quickly cover what we're talking about here. In the world of 2D games, it's very common to use some sort of equally 2D array of values to represent aspects of the game world. An obvious example would be a tiled map where the world is basically a grid of tile-type values. The straight-forward approach to this is to use an actual 2D array so that accesses are of the form:

Local array2d:Int[][]
'initialise array here
Local val:Int = array2D[row][col]

The proposed faster alternative is to use a 1D array and to do the indexing calculations yourself:

Local array1d:Int[] = New Int[ROWS*COLUMNS]
'initialise array here
Local val:Int = array1D[row*COLUMNS + col]

Why would that be faster?


Firstly, if you're iterating over an array in some manner, you can avoid some of the indexing arithmetic. Hitting every element only requires incrementing a single index and you can traverse a column by simply incrementing by the row length. This is one step removed from the standard C technique of using a pointer to traverse an array rather than a subscript index (which is also questionable depending on circumstance, but that's another story).

Secondly, it avoids the construction of multiple arrays as Monkey uses the array of arrays model for >1D. This might be cheaper to initialise, it also might be cheaper on garbage collection costs if you're throwing these arrays away. When accessing a value, it's possible that there are costs from the row indirection that are avoided by directly calculating an index. There's also a possibility that you'll get more benefit from lower-level things like predictive caching if all the data is in one consecutive lump.

Sounds convincing. Why doubt it?


Because none of that may be true for a given target or implementation. Remember that Monkey isn't a directly compiled language. It gets translated to another language and then that code is compiled for a platform target. Those targets have very different properties and optimisation paths. Making broad simplistic statements about micro-optimisation performance across all those targets seems, well... simplistic. Pushing those statements to the point where they're blanket "1D is always much faster than 2D" is even worse.

I've been doing this coding thing for a fair while now and much of it has been spent in environments where performance is a significant concern. I've worked on low-level C programming, embedded Java apps, desktop .NET applications and in high-level scripting languages. Every one has thrown up instances of people (including me) doing things based on "facts" about performance that turned out to be entirely untrue, partially untrue or just so small in effect that there was no point in doing it.

The Tests


I set up a very simple test that constructed and initialised the values of an array, performed sequential iteration over all the elements of the array and then a large number of random accesses to the array, timing each group separately. The accessed array data was just added to a total value to minimise the costs outside of the array accesses and loop structures required.

The test ran with five variations on a one million element array of Ints: 10x100,000, 100x10,000, 1,000x1,000, 10,000x100 and 100,000x10. What's that about? Well the major difference between a 2D array and a 1D array in most OO languages is that the 2D array is an array of arrays. This means that each row is another array instance, therefore there is potentially a big difference between a "wide" array versus a "tall" array (for the purposes of this post I'll talk about arrays as row x col).

So, lets take a look at some graphs shall we? Who doesn't like graphs?


Construction And Initialisation


Here are the results from the construction and initialisation part. The array instances were constructed and then the elements initialised to known values:

The graph is displaying the percentage difference between the timing for a 1D array and a 2D array. So, if the 2D array took twice as long, that would register as 100%. If it took half as long, then it shows as -50%.

There are some obvious big spikes on the Android and XNA platforms when dealing with the "tall" arrays, which is where I'd expect to see the biggest difference because of the numbers of objects being constructed. The gap rapidly narrows though and it's 20-30% for the square array. The Flash results are a bit weird but that double spike is repeatable - I don't know what's going on there - the other variants are pretty much even though. The GLFW target shows the 2D array as faster in all but the tallest case.

The HTML5 results I'm showing here are from Chrome 18. I didn't spot how decidedly screwy they were until I plotted the graph. Not screwy as in the results are wrong, just screwy as in those results are bizarre. For whatever reason, this particular test runs very slowly on the 1D array. I'll deal with the various browsers separately later. For now just take it as a clear demonstration of how actual performance can vary from expectation.

Iteration


The next test is simply iterating over the whole array and summing all the values. Again, the graph shows the percentage difference in time for the 2D operation versus the 1D.


Those Android figures show a fairly solid advantage to the 1D array. The rest though? Outside of the one spike for the tall array on GLFW it's all pretty close. For the faster targets the variation is small enough that the 1ms timing precision is a factor. I certainly wouldn't say anything other than "they're pretty much the same" for XNA, Flash and HTML5.

Random Access


The last test was to generate a list of random row and column indices and then run through them, summing the values. This sort of accessing tends to frustrate low-level optimisation techniques like predictive caching.


Now that is a pretty dataset if I do say so myself. The pattern across all the targets is clear and I think it can do without detailed commentary.

Are We There Yet?


Okay, so what do we make of the "1D arrays are much faster than 2D" now we've got the data? Well first we can see the nugget of truth in it -- if your array is tall and narrow going with 1D indexing will likely be faster. On the other hand, you could just swap your columns and rows and it's going to be a dead heat or maybe even a win for the 2D representation. For the usual case where the array is square-ish I can't see that the small wins that are there are worth the bother.

Frankly the number of situations where 1D might be worthwhile seems like it's going to be pretty small. Even on Android, where the 1D array is consistently faster across the board, I'd struggle to justify fiddling about with array index calculations. "No way!" I hear the 1D fans shout "There's a 20-50% performance boost!". Well, yes and no. What we're looking at here is a micro-benchmark that is specifically intended to put the costs of array access in the spotlight and it does that by doing very little with the data accessed. If you're doing anything more than adding integers then the picture changes quite rapidly.


That's a chart showing the Android costs for 2D versus 1D when calling a simple addition function versus a function that does just an additional multiply, Sin and a Cos operation. The earlier tests did the addition inline and you can see that just the function call overhead on the addition mutes the advantage significantly. As the function gets more computationally intensive that effect increases and our simple bit of trig reduces everything but the worst case to under 5% difference.

Whatever array access costs are in play will quickly become swamped by any real calculation work done on the data being accessed. Unless your game loop is pretty close to the perfect storm of just summing the elements of an enormous array the actual visible frames per second speed increase from going with a 1D array is likely to be non-existent.

Don't Listen To Me


So, perhaps I've convinced you that 1D is a waste of time. Don't bother. Not worth it. Right?

Wrong.

Practically, what I'm saying is that you should just leave this technique in the toolbox until you've identified a performance hot-spot where you could gain some benefit. Hopefully the data above gives you some insight into what that situation might look like. If you find such a situation then bring it out, add a little messiness to your code and enjoy the benefits of a worthwhile speed boost in exchange.

More generally I'm saying that Monkey isn't BlitzMax, or C or Java or anything else. Micro-optimisations often fail to remain valid between compiler revisions on the same OS and hardware and they're far, far less likely to be valid across all Monkey targets. Further, when people lean on such tricks to gain performance it tends to blind them to larger optimisations at the data structure and algorithm level that can make a much greater difference and which are far more likely to hold value across multiple targets.

That HTML5 Thing


Oh, yes. So those Chrome initialisation results where 2D was many times faster than 1D? Here are the results for all the tests across Chrome, FF and IE:


There you have it. One target and there are circumstances where no matter if you go with 1D or 2D you'll lose by as much on one browser as you'll gain on another. Feel free to try to make a snappy catch-all recommendation from that.

6 comments:

  1. I haven't done any profiling (maybe I will try it sometime) but I think your array examples don't really correspond to the typical situation in which I would recommend 1D arrays. Suppose you are making a roguelike game and the dungeon floor is a 2D array of integers or tile objects. Now when you are doing pathing calculations, in a 2D array you have to find the square to the northeast squares by:
    northeastSq = dungeon[ xPos + 1 ][ yPos + 1 ]
    Whereas in a 1D array, you will write:
    northeastSq = dungeon[ pos + 65 ]
    (your dungeon is 64 squares wide).
    That surely has to be a big performance win -not to mention the fact that you can store references to squares you are working with using a single index - and it comes up again and again (in my code anyway).

    Nice blog, by the way. I just discovered it today from a link in the Monkey forum.

    ReplyDelete
  2. I'm afraid that I don't see the big difference between your example and my tests. What you've written is pretty much what was tested except that you've incorporated another addition to the 2D case by choosing an access with an offet across both row and column. (I'm not sure how this manages to be the typical situation for a recommendation, by the way. Code that only accesses diagonals seems rather esoteric to me.)

    Is one addition faster than two? Sure. Is your 1D access going to be hugely faster than the 2D on the basis of that difference in the majority of games applications? Not according to the numbers above. Additions simply aren't that expensive in the scheme of things. About the only thing I can think of where you might get something really worthwhile out of this is some sort of cellular automata or an image processing application where you're spending all your time iterating over an array and performing calculations on sub-sections of that array.

    In terms of the ease of indexing, if you say that having a single value that is actually a packed 2D position is easier for you in some way then I won't argue. I don't find that myself though.

    ReplyDelete
  3. The diagonal example was just to show the difference in its clearest form. Of course there will be horizontal and vertical moves as well.

    And I wasn't talking about the ease of indexing, but the fact that you can easily store a single index in an int, whereas to store an (x,y) position you need a struct which in Monkey is a significant overhead as it is an object rather than a primitive.

    The example I was thinking of is the Djikstra algorithm for pathfinding, which is likely to be important for a roguelike game. But other AI and map generation algorithms would be similar.

    Okay, I tested it. On repeated runs for a 200x200 map, completely filling from a random point with 8-way movement, the 2D array took 98 ms on average while the 1D array took 56 ms. On a 100x100 map it was 26 ms vs 14 ms. So a factor of nearly two. Not a dealbreaker but potentially significant.

    Strangely, the first iteration of each test took over four times as long in HTML5 on IE. There must be some caching or JIT compilation going on. A sequence of ten iterations (first the 2D test, then the 1D test, then print the time for each) looked like:
    418 241
    97 57
    95 54
    99 54
    95 54
    107 61
    96 55
    97 58
    96 55
    96 53
    Seems like you should never trust the first iteration in HTML5. Or is it the iterations after the first you should not trust? I guess it depends...


    I can send you the code if you like but it's a bit long to post. Maybe I'll put it on Monkey coder.

    I don't recommend the use of 1D arrays willy-nilly, but for chess or roguelikes they are the way to go IMO.

    ReplyDelete
  4. But you don't need a struct/class. You can use two ints if you want. Besides, have you actually timed the overhead of accessing an object field on all the targets? Where does it sit in comparison to all the other operations that would go into building a frame or generating a map?

    I think somewhere along the line you're missing the point. The post clearly shows scenarios where the 1D case is faster than 2D. I openly state that there are situations where it is true and there's an entire titled section saying that you could potentially benefit from 1D in some cases. Perhaps your single test on one browser on one target is one of those cases.

    The point is that there's no reasonable way to make any blanket statement that 1D is the best choice without knowing the specifics of the algorithm and the intended target platform. I'd say that includes statements like "for chess or roguelikes they are the way to go" - it simply covers too many possibilities to be anything other than a simplification.

    ReplyDelete
  5. I need a struct because my algorithm requires making a list of the squares I am processing, and the adjacent empty squares I have found. With a 1D array, it's a list of ints, with a 2D array it's a list of points. This is the indexing advantage I was talking about.

    I don't want to get in a fight with you - I agree that premature optimisation is a bad idea in general and maybe we differ only in how confident we are that we can optimise certain situations. Also, I think you overestimate the difficulty of using 1D arrays; in many ways they are simpler and my 1D code was actually shorter than the 2D version. So for me 1D is the default when I have a game with a 2D array of cells.

    That said, even a 2X speed advantage (in truth I expected a little more) is not usually going to be a make or break issue - at most it would mean a game that ran better on slightly older PCs.

    I'll test a few more platforms and put it in Monkey Code and you can comment if you like.

    ReplyDelete
  6. There's no fight. The post is about a general statement about performance and your issues appear to be specifics. In terms of differing confidence my position is simply that outside of simple like-for-like comparisons such as "one addition is faster than two" you can find yourself rapidly divorced from reality when treating a micro-optimisation that is valid in one situation on one platform as a universal performance salve.

    I'll be happy to look at your example when you post it. It doesn't seem to challenge the data I've already gathered, which, as I said, shows that 1D can be faster than 2D in some situations and on some platforms.

    ReplyDelete