Other ways to unmask the bitmap

I was thinking about the way Casey did is unmasking of the bitmap file and i started thinking if that was the easiest way of doing it. So i started play with some bitoperations to see if I could come up with some other way of doing it without looping trough all bits. Mostly because i wanted more experience in what bit operators do.

Anyway, this is what i come up with, I’m not saying it’s a better way, nor that its better looking. I just wanted to do it in another way without the loop count thing.
I also didn’t take account for the alpha channel but it wouldn’t be any problem implementing it i guess.

Maybe this is just a stupid way of doing it?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
Int32 someColor = 0x123456;

Int32 maskRed = 0XFF0000;
Int32 maskGreen = 0X00FF00;
Int32 maskBlue = 0X0000FF;

Int32 redOut = maskRed & someColor;
Int32 greenOut = maskGreen & someColor;
Int32 blueOut = maskBlue & someColor;

if (redOut >> 16 == 0) redOut = redOut << 8;
if (redOut >> 16 == 0) redOut = redOut << 8;
            
if (greenOut >> 16 != 0) greenOut = greenOut >> 8;
if (greenOut >> 8 == 0) greenOut = greenOut << 8;
            
if (blueOut >> 16 != 0) blueOut = blueOut >> 16;
if (blueOut >> 8 != 0) blueOut = blueOut >> 8;


Would it perhaps be an even more efficient way of swapping the channels?
I doubt this code will have more performance. Typically you want to avoid branches in inner loops, although in this case your branches probably will be predicted pretty well.

What Casey wrote is good enough. You could optimize it by creating specialized code for common cases (when maskRed=0xFF0000, maskGreen = 0x00FF00, maskBlue = 0x0000FF and other combinations) leaving current code as fallback.

Maybe if your compiler is good you could do something like this:
1
2
3
4
redOut = (redShift < 16) ? (redOut >> 8) : (redOut << (16 - redShift));
greenOut = (greenShift > 8) ? (greenOut << 8) : (greenOut >> (greenShift - 16));
blueOut = (blueOut >> blueShift);
colorOut = (redOut | greenOut | blueOut);

This may be faster on ARM where compiler can use conditional instructions. Not sure about x86/x64.

Edited by Mārtiņš Možeiko on
One of these ways of counting the trailing zeroes without a loop might be a good default for platforms that lack the CPU instruction. I kinda like the look of ntz10, but who knows .

http://www.hackersdelight.org/hdcodetxt/ntz.c.txt

Every time I have to do some bitwise acrobatics I make sure to check my copy of Hacker's Delight first: http://www.eetimes.com/author.asp?section_id=31&doc_id=1286016
What I did was this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
uint32 Red = *SourceDest & Header->RedMask;
uint32 Green = *SourceDest & Header->GreenMask;
uint32 Blue = *SourceDest & Header->BlueMask;
uint32 Alpha = *SourceDest & Header->AlphaMask;
//*SourceDest = (*SourceDest >> 8) | (*SourceDest << 24);
if (Header->RedMask == 255) {
	Red = (uint8)Red;
}
else {
	Red = (uint8)(Red / 255);
}
if (Header->GreenMask == 255) {
	Green = (uint8)Green;
}
else {
	Green = (uint8)(Green / 255);
}
if (Header->BlueMask == 255) {
	Blue = (uint8)Blue;
}
else {
	Blue = (uint8)(Blue / 255);
}
if (Header->AlphaMask == 255) {
	Alpha = (uint8)Alpha;
}
else {
	Alpha = (uint8)(Alpha / 255);
}


The nice thing about Hex numbers is that:
0x0000ce00 / 255 = 000000ce (obviously just a bitshif)
but
0x00ce0000 / 255 = 0000cece (with the lower bits just what we want)
and
0xce000000 / 255 - 00cecece (again with lower bits just what we want)

So if mask is 255 then just take it other otherwise do the above

No idea about speed though.

And yes there is a forth alphamask "variable" in the bmp struct.
You definitely want to do shift instead of division by 255. Division by non power of two compile time constant will be implemented by multiplication and shift right. So one shift will be faster than your division.

Basically to get from 0x00ce0000 to 0xce, it is better to do (x >> 16) than (uint8)(x / 255). Shifts are always better than multiply or divide. Compilers also know that - they replace multiplies and divides by power of two with shifts (for example x*16 == x<<4 and x/128==x>>7).

Also you will need to shift red, green, blue and alpha bytes in proper places before or'ing them together to uint32. So you will be performing one multiplication and two shifts per component. In my code that is done with just one shift per component - extracting and shifting them back into uint32.

Edited by Mārtiņš Možeiko on
Thanks for clearifying, I thought already that it wasn't the most performant code. But just remembered the HEX devide quirck that lets you do this..
On POSIX platforms (Linux, OS X) the C standard library includes the ffs family of functions, e.g. ffsl().

But without a doubt, this is my favourite unnecessarily complex method. Yes, it works on full 64-bit words. Let me know if anyone wants an explanation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
inline uint64_t
sideways_gt_0_8(uint64_t x)
{
    const uint64_t h8 = 0x8080808080808080ull;
    return (((x | h8) - 0x0101010101010101ull) | x) & h8;
}


uint64_t
find_first_set(uint64_t x)
{
    const uint64_t l8 = 0x0101010101010101ull;

    // Popcount by bytes
    uint64_t s = x - ((x & 0xAAAAAAAAAAAAAAAAull) >> 1);
    s = (s & 0x3333333333333333ull) + ((s >> 2) & 0x3333333333333333ull);
    s = ((s + (s >> 4)) & 0x0F0F0F0F0F0F0F0Full);

    // Cumulative sum
    s = s * l8;

    // Find out which is the first non-zero byte.
    uint64_t b = ((sideways_gt_0_8(s) >> 7) * l8) >> 56;
    b = 64 - (b << 3);

    // Extract that byte.
    s = (x >> b) & 0xFF;

    // Cumulative sum the bits
    s = (sideways_gt_0_8(s * l8 & 0x8040201008040201) >> 7) * l8;

    // Work out which is the smallest nonzero bit
    uint64_t res = (sideways_gt_0_8(s) >> 7) * l8 >> 56;

    return b + (8 - res);
}
My favorite link regarding all sorts of bit twiddling tricks: https://graphics.stanford.edu/~seander/bithacks.html

Among other things there's algorithms to find how many trailing zero bits a value has, which could be used here..