Simple coordinates?

EDIT: first off I should say this is after watching ep 32, so maybe this has already changed in the next couple of videos. ENDEDIT.

Seems like you're spending an awful lot of effort detecting the when a tile-relative coordinate leaves the tiles and "recanonicalizing" etc. and this is all caused by choosing to store your positions in a "decomposed" form which means you basically have to re-implement your own weirdo-radix math routines (e.g. handle the "carry bit" when you move something out of a tile etc.). I'd suggest storing things in a robust "raw" form and then doing the decomposition when needed. This removes the edges cases from the common case.

It would be much easier to just use two 64 bit ints for x/y, and define 1 in this unit to be 1/1024 of the tile size (or something.. enough to give you sub-pixel positioning). All the actual logic happens in "world units", and you don't have to handle any special cases anywhere. If you want to move the player you just add some world units to its location and there are no edge cases.

The "former edge cases" are automatically handled when the code that needs to know e.g. the tilemap coordinates does the modulo etc., but the difference is that this code is exceedingly trivial to write/read/understand. E.g. to get the tile coordinates just shift down 10 bits, to get the tile-relative offset mask out the lower 10 bits (and maybe scale it into SI units and use floats...), and more importantly it's no longer an "edge case" it's just the normal code you run all the time.

Edited by Sebastian on
Not being very experienced on the topic, I think the reason we don't do this is that emulating 64-bit operations on 32-bit CPUs is quite expensive. I don't really want to guess how much more expensive, but I would think probably an order of magnitude more CPU operations at least.

There are other advantages to having a "canonical" style coordinate system, though. You get a lot more precision within the tile which, depending on how large our tiles end up on the screen, could be noticeably advantageous. Also you don't have to think about the division all the time to get from 1/1024-units to tile units, which I would think would end up as the same amount of work as "canonicalizing" a coordinate, in later code.
Yeah the compiler has to emulate 64 bit operations on a 32 bit CPU, but *that's what it does now*. The tile relative float is basically the lower 32 bits of a 64 bit number, it just has a weird radix and uneven distribution. When checking the tile relative number for "overflow" and then adding stuff to the tile position if it does it's really just analogous to handling the "carry bit" on the 32nd bit of a 64 bit number... except you do it with branching and custom tests instead of just letting the CPU/compiler do it.

You wouldn't do the 1/1024 "all the time", you'd express everything you want to do in "world units", and you only need to do conversions when finding the current tile, or the current tile map. I'm not sure you ever even need to know the "tile relative" location for anything real or if it's just a somewhat arbitrary decomposition to get more precision (corresponding to the lower 10 bits of the scheme I suggest).
Using 64-bit types in 32-bit CPU/process is not so expensive as you might think. Definitely not an order of magnitude more. Most operations are very cheap (bitwise ops, add, sub). Multiplication is a bit more slower. Division is a lot slower.

As long as you are using 64-bit types only for bitwise operations, add, sub, then you should be OK with performance.
Well, some of ssylvan's concerns have already been answered through this week, but I believe Casey's process in this aspect is the main beauty of this series.

I am sure he has experience with results that are clean and efficient and as has become apparent in some cases (the pixel to world units explanation or the inverse y axis) already knows what the correct way to code them would be right off the bat, but he has chosen to write what came to mind first and build up to a cleaner and more efficient method while following a process. This allows those of us that have never coded anything like this to understand a process that reaches clean code, as opposed to just showing us clean code which we would just have to copy.
64-bit integer is definitely not a good option, because we'd end up having to do a bunch of 64-bit fixed point vector math, which would be a massive, massive speed hit. Remember that every single entity in the world is going to have to think about every single point of interest for them in these coordinates. There's no way we're going to be doing that in 64-bit integer.

64-bit float is definitely an option, but I chose not to do it because we're not always going to be on platforms with reasonable 64-bit float support, and because we have to do tile indexing anyway, so I feel like it will end up being a good compromise to store tiles with offsets.

Recanonicalizing will make more sense to you once you can see more of the pipeline at work, I suspect. Assuming it ends up somewhat like the 3D code I typically write, you loft everything out of tile coordinates into local 32-bit floats, you do all your math there (no tile indexes), and then you canonicalize back for storage. It allows you to do almost everything in full-width 32-bit floating point math, which is always the fastest math on today's hardware, and you only pay the in/out cost when you actually need to go back to tiles.

It's not as clean as if everything were 64-bit floats, which you could just use directly with abandon, but I think it will be more efficient in the long run, especially on lower-end platforms.

- Casey
Oh I'm not suggesting doing all the math in 64 bit! Once you have the offset for a player (or whatever) you can do a single 64 bit add which would be the analogue to the "recanonicalization step" (but without the complexity).

You can do your math in 32 bit in any space you want (e.g. relative the current center pixel).

So I'm not saying anything *terribly* different, I'm just saying that the "canonical" form can be a simple 64 bit int corresponding to micrometers (or whatever) instead of this ad-hoc "32 bit int and 32 bit float" combination. You could use 64 bit doubles and use metres too, of course.. That's possibly even simpler and probably still more than enough precision.

Edited by Sebastian on
I guess I don't see what is "ad hoc" about the current scheme, though. Assuming you move out of canonical, do your work, then move back in, why would 64-bit integers be less ad-hoc? If anything, they'd be more work, because instead of just doing your work in 32-bit float and then converting one of the results to a 32-bit integer for storage and lookup, you now have to convert both and pack them together?

- Casey

Edited by Casey Muratori on
It's Ad hoc (or "custom" or whatever you want to call it - I mean no value judgement by this btw) because in practice you have a 64 bit coordinate, but instead of just using a standard 64 bit number (e.g. an int64 or double), you use a 32 bit int for the "high bits" and a 32 bit float for the "low bits" with manual handling of the carry bit between the two halves. Using a standard int or double would be less ad hoc because it's just a standard type that everyone already knows.

I don't understand the last part of your post. There wouldn't be any "both". Storage is just one 64 bit number for X and one for Y. At the start of doing physics you might convert into "screen space" (by subtracting screen position and then casting to 32 bit float for easier math), and at the end of the physics etc. you convert back (by converting to 64 bit and adding back the screen position). Both of these conversions are perfectly trivial. For storage it's *just* two 64 bit numbers which express the X and Y in a "standard" C way.

For some reason I still like the 64 bit int approach because it's uniform precision, but you could do the math and ensure that 64 bit double has enough precision always and it would be really quite nice. You could have a PositionXMetres variable or something and you'd never have to explain what that means to anyone.

Edited by Sebastian on
ssylvan
at the end of the physics etc. you convert back (by converting to 64 bit and adding back the screen position).


That is the part that I am talking about being less efficient. As an example, what is the set of CPU instructions necessary to do this on a platform that does not have native 64-bit ints or 64-bit floats, or where the 64-bit floats are significantly more expensive (ie., x86)? Furthermore, have you looked at the cost difference between doing it on platforms that do, but given that it is half the width SIMD?

Anyway, like I said, I don't think it's an awful solution to use 64-bit approaches or anything, but I don't think it's a better solution than what we'll end up doing, and in fact I think it might be provably worse in terms of efficiency. But we'll have to see in the end, because it's definitely something that will only be completely clear once there's a lot more code.

- Casey
But won't that part be basically the same whichever method you use?

You're going to need to put all the objects for the current "screen" (and maybe a bit more) into some shared 32 bit float space so you can do your physics and gameplay etc. on them, and then convert them back for storage when you're done, right? So the only difference is what format the "storage" is done in.

So the question is if it's cheaper to convert a 32 bit "screen space" float to a 64 bit "screen-relative offset" (either a cast or a scale and cast depending on if you use a double or int64) and a single 64 bit add, or if it's cheaper to do your current "fmod" style operations followed by a couple of 32 bit ops.

I'd suspect your current method is probably less efficient because it just seems more complicated (and I'm just expecting the compiler to use the appropriate machine instructions to handle overflow etc. efficiently even on systems without native 64 bit ALUs). At the very least it's harder to understand to me.

Edited by Sebastian on
The way I usually do this is have a grid of "screens" which are each like 64x64 tiles or whatever (if bigger than the screen then the camera follows the player around that "screen" or "level"). Then basically store the player "screen index" (int) which is what screen they're in and their "screen position" (float) which is their position within their local "screen" or "level."

And then when combining / comparing, do the "canonization" thing to make sure everything's legit.

I never considered doing it another way, it's great to hear about these other approaches / methods.
ssylvan
So the question is if it's cheaper to convert a 32 bit "screen space" float to a 64 bit "screen-relative offset" (either a cast or a scale and cast depending on if you use a double or int64) and a single 64 bit add, or if it's cheaper to do your current "fmod" style operations followed by a couple of 32 bit ops.

You have to do the divide either way if you want to have arbitrary sized tiles. If you don't want to have arbitrary sized tiles, then you don't have to do the divide in either case.

The difference comes in what you do after you normalize back to "tile coordinates". If everything is stored 32-bit, then you remain the same width through the entire pipeline, and can always be 4-wide. If you are 64-bit, then you have to repack up and downshift to 2-wide in addition to whatever operations you're going to do.

Much like you, I only have a hunch that 64-bit will usually be slower :) We'll have to see what the actual thing looks like in the end to know. It may be that we end up not needing any of this because of something else that ends up happening :)

- Casey
Very interesting discussion here. I'd like to take the opportunity to ask you Casey:
Why worry so much about performance now?

I mean, you always talk about optimizing things for speed later on, after everything has already been settled and working as intended. Yet in this specific situation, you have chosen what seems to be a more complicated scheme for storing the coordinates basing the decision on performance metrics that not even you are sure would be worth it down the line.

Don't take this personally, but it really feels like a double standard issue. This is just classic premature optimization in my book. Besides, I believe you would be missing a good opportunity in the future to potentially teach people why using 64 bits was really a bad idea, when eventually you would, in the optimization step, try to then prove that using 32 bits would indeed be better. At that time, you would have almost all of the game ready, and would be able to actually measure the speed difference, something you just can't do now.

This is just one instance where I noticed you optimizing things prematurely, and I presonally think it is kind of bad for the purpose of the stream.

What are your thoughts?