C pointer casting rules

How are the rules for pointer "up" and "down" casting?


Example given:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
int main(int argc, char *argv[])
{
	unsigned int integer = 255;
	unsigned int *pointerInt = &integer;

	printf("char %d\n", *pointerInt);


	unsigned char *pointerChar = (unsigned char *)pointerInt;
	printf("char %d\n", *pointerChar);
}


This would print
1
2
char 255
char 255


If I increase integer to 256 the result is as expected:
1
2
char 256
char 0

The char jumps back to 0 if its boundaries are reached.


However what happens here:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
int main(int argc, char *argv[])
{
	unsigned char integer = 255;
	unsigned char *pointerChar = &integer;

	printf("char %d\n", *pointerChar);


	unsigned int *pointerInt = (unsigned int *)pointerChar;
	printf("char %d\n", *pointerInt);
}


This will print:
1
2
char 255
char "random unexpected number"

Does this happen because the pointer reads more memory than I actually set?


And how actually does the stack work?

If I had something like this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int main(int argc, char *argv[])
{
	unsigned char integer = 255;
        unsigned int integer = 3000;
	unsigned char *pointerChar = &integer;

	printf("char %d\n", *pointerChar);


	unsigned int *pointerInt = (unsigned int *)pointerChar;
	printf("char %d\n", *pointerInt);
}


Would then the value pointerInt points to be consistent every time the program is executed?
My thinking is: Even If I would read further than the char's boundaries as I'm doing it with pointerInt, the value presented by pointerInt should be the same because I set the memory after integer to integer2.

So going through it

I write "integer" to the stack. Then I write "integer2" to the stack.
I print "integer". Its 255.
I increase the range size of pointerInt. So pointerInt points to the same location as pointerChar did before. But I's range is bigger because its of type int. So it should now read "char integer" and 3/4rth of "char integer2".
I print the value. It should be something consisting out of 255 and 3000 due to the stack being consistent right?

I tested it but it isn't. Why?

Edited by Adrian on
So you're assumptions so far are pretty much right but you have to remember, that the stack usually grows downward.

If you try this version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include <stdio.h>

int main(int argc, char *argv[])
{
        unsigned int  integer2 = 3000;
	unsigned char integer = 255;
	unsigned char *pointerChar = &integer;

	printf("char %d\n", *pointerChar);


	unsigned int *pointerInt = (unsigned int *)pointerChar;
	printf("char %d\n", *pointerInt);
}


And you compile without any optimizations it should give you consistent results. Also remember that, as soon as you let the compiler optimize the generated code all bets are off as to what goes where (as far as I know, somebody correct me if I'm wrong :-).
Pointer is never really casted "up" or "down" in this case. Compiler simply produces code that reads or writes specified number of bytes (depending on type, int=4 bytes, short=2 bytes, etc) from/to memory. It's up to you to guarantee correct behavior of program when reading/writing memory. Memory is just an array of bytes, it's up to you how to "view" it - which 4 bytes in which places is actually an unsigned int variable.

This would print
1
2
char 255
char 255

Be careful with this assumption, it is not universally true. Yes, on Intel CPU it will do that, but on other architectures it could print 0. Some architectures are little endian, some are big.
Little endian means that bytes in integer are stored to memory in a way where least signficent are stored first. Big-endian is other way around.

On little-endian CPU like Intel your integer is stored in following way:
1
2
3
+-----+-----+-----+-----+
| 255 |  0  |  0  |  0  |
+-----+-----+-----+-----+

But on big-endian (like PowerPC, or some MIPS and ARM) it will be stored like this:
1
2
3
+-----+-----+-----+-----+
|  0  |  0  |  0  | 255 |
+-----+-----+-----+-----+


Does this happen because the pointer reads more memory than I actually set?
Exactly.
If you would look in memory window with debugger you would see something like this:
1
2
3
4
5
6
    +-----+-----+-----+-----+
... | 255 | ??? | ??? | ??? | ...
    +-----+-----+-----+-----+
    ^
    |
    +-- this is where "integer" variable is stored (it's address)

Where ??? are unknown values. They could be the same every time you execute, they could be different. It depends on code.

This kind of code behaviour is called "undefined". What that typically means is that compiler is free to do whatever it wants. In this case it simply reads whatever is in memory and gives back the value. But it is free to simply terminate program, or return from function or remove code and not execute it. Be careful!

Would then the value pointerInt points to be consistent every time the program is executed?
It depends on code. Usually it is not consistent.

1
2
	unsigned char integer = 255;
        unsigned int integer = 3000;

I assume second integer is really integer2 (with value 3000) ?

I increase the range size of pointerInt. So pointerInt points to the same location as pointerChar did before. But I's range is bigger because its of type int. So it should now read "char integer" and 3/4rth of "char integer2".
Yes and no. It depends on compiler. Compiler is free to put variables on stack in whatever order. Or not put at all. Or put some other internal temporary data between your variables. If it can figure out what are you doing with variables in your function (like all your examples here), it can simply print out constant values without reading and writing to stack. Compilers can do this kind of analysis pretty trivially. You should never rely on stack layout of local variables in your program.

For example, look here at disassembly: https://godbolt.org/g/HSYKcF
You can see that compiler simply calls "print" function two times with constant 255 as argument. No reading or writing to stack. It figured out what your function is doing at compile time.

Edited by Mārtiņš Možeiko on
What do you mean the stack grows downward?

1
2
3
4
5
6
7
int main(void) 
{
  char v1 = 20;
  char v2 = 40;

  char *p = &v1;
}


So v2 will actually be at a lower address then v1?

What happens when I do p++ ? Should it now point to v2 because v2 would be the next lower address? Or does it actually increase in its address?

And if I write something to memory, lets say a bitmap pixel where every color is 8 bit with an alpha:

1
2
3
4
5
6
7
8
uint32 *Pixel = (uint32 *)Buffer->Pixel1;
 
uint8 Blue = Value;
uint8 Green = Value;
uint8 Red = Value
uint8 Alpha = Value;

*Pixel = Blue | Green << 8 | Red << 16 | Alpha << 24;


On Little Endian:
This will be stored in memory like |Alpha Red Green Blue| right?
But how do I read it? Do I read it the same way or swapped again so as I would expect it to be in the first place?

But on Big Endian assuming I would only write the first byte like so:

*Pixel = Blue;

It would then be written to im memory as follows: |Blue 00 00 00|
where as on Little endian it would be |00 00 00 BB| am i correct?


Edited by Adrian on
adge
What do you mean the stack grows downward?

1
2
3
4
5
6
7
int main(void) 
{
  char v1 = 20;
  char v2 = 40;

  char *p = &v1;
}


So v2 will actually be at a lower address then v1?

What happens when I do p++ ? Should it now point to v2 because v2 would be the next lower address? Or does it actually increase in its address?


This is undefined behavior because the compiler is free to not allocate space for v2 on the stack (keeping it entirely in a register).

the compiler is also free to reorder the variables on the stack for alignment or cache line purposes.

But by default the memory layout of the variables on the stack will look like:

1
2
3
4
5
6
struct{
char* p;
char v2;
char v1;
//+probably some padding to round the size up to 4 for alignment reasons
}


adge
What do you mean the stack grows downward?


This excellent intro to Reverse Engineering has a section on memory layout, if I recall correctly, which addresses stack and heap and the directions they grow in and what that means. HTH.
ratchetfreak
This is undefined behavior because the compiler is free to not allocate space for v2 on the stack (keeping it entirely in a register).

Even if compiler allocates v2 on stack and next to v1. It is still technically undefined behavior - you cannot dereference pointer when it points outside of memory region it "belongs" to. For example, if you move pointer with ++ or -- between array elements - everything is OK. But once you move outside array, undefined behavior. In this code example it must always stay poitner to v1, no ++ or -- should not be done.

As I told before - you should never rely on stack layout of local variables in your program! Never!

This will be stored in memory like |Alpha Red Green Blue| right?
But how do I read it? Do I read it the same way or swapped again so as I would expect it to be in the first place?

No, in little endian it will be stored [Blue, Green, Red, Alpha]. First byte is one that is least significant, in this case Blue (you shift it by 0). The most significant byte is Alpha, because you shift it by 24.

But on Big Endian assuming I would only write the first byte like so:

*Pixel = Blue;

It would then be written to im memory as follows: |Blue 00 00 00|
where as on Little endian it would be |00 00 00 BB| am i correct?
No, in little endian it is [BB, 00, 00, 00]. On big endian it is [00, 00, 00, BB]. See explanation above why.

To avoid issues with endianess, you can always read/write bytes from memory as array of bytes, instead of reading/writing integer and then shifting. Like this:
1
2
3
4
5
6
uint8 *Pixel = (uint8 *)Buffer->Pixel1;
 
Pixel[0] = Blue;
Pixel[1] = Green;
Pixel[2] = Red;
Pixel[3] = Alpha;

This would store pixel value as [Blue, Green, Red, Alpha] in memory. Both on little and big endian system the same.

Edited by Mārtiņš Možeiko on
Hmm I think I got a fundamental understanding problem with how memory works

Lets say I have memory which is 4 bytes big so 32 bit. The memory is handled as little endian. So the least significant byte is stored in the smallest address.

Byte significance increases to the right -->
1
2
3
4
+-----+-----+-----+-----+
|  0  |  0  |  0  |  0  |
+-----+-----+-----+-----+
1-----2-----3-----4


So I got the Addresses 1 - 4. My Pointer points to 1. And I got the variable int i = 17.

So I execute the following code:

1
2
int i = 32;
int *p = &i;


My memory would now look like this:
1
2
3
4
+-----+-----+-----+-----+
| 32  |  0  |  0  |  0  |
+-----+-----+-----+-----+
1-----2-----3-----4

Is this correct?

Now with the bitmap example:

1
2
char Blue = 255;
*Pixel = Blue;

Resulting memory:
1
2
3
4
+-----+-----+-----+-----+
| 255 |  0  |  0  |  0  |
+-----+-----+-----+-----+
1-----2-----3-----4

But what about the following code:

1
2
3
4
5
char Blue = 255;
char Green = 100;
char Red = 20;
char Alpha = 230;
*Pixel = Blue | Green << 8 | Red << 16 | Alpha << 24;


In my world this would result in :
1
2
3
4
------------------+-----+-----+-----+-----+
Alpha Red Green   | 255 |  0  |  0  |  0  |
------------------+-----+-----+-----+-----+
------------------1-----2-----3-----4


With Green Red and Alpha stored before Address 1.
Because I think that << means: shift to the left (to the next lower memory address)

But I am probably wrong but I don't know why.

Edited by Adrian on
adge

Because I think that << means: shift to the left (to the next lower memory address)

But I am probably wrong but I don't know why.



Left-shift means shifting towards higher significant bits, not lower ones. For example,
1
2
int x = 0x0ab0;
int y = x << 4;


Here, y = 0xab00, not 0x00ab. Notice that the bit significance increases from right to left, not left to right.

So in your last example, the result would be (in little-endian)
1
2
3
4
+-----+-----+-----+-----+
| 255 | 100 |  20 | 230 |
+-----+-----+-----+-----+
1     2     3     4


Edit: Also, displaying memory locations as increasing from left to right just a convention. You could also represent them the other way around, which lines up intuitively with what the shifting does:
1
2
3
4
+-----+-----+-----+-----+
| 230 |  20 | 100 | 255 |
+-----+-----+-----+-----+
4     3     2     1

Edited by Mattie on
All right! Now it totally makes sense. Thank you!
Sizik
You could also represent them the other way around, which lines up intuitively with what the shifting does

Only on little-endian system :)