Hi there,
I am following handmade hero and I found something interesting on day 127.
Casey aligns the fill rectangle so that MinX and MaxX are both aligned on a four-pixel-boundary.
He sets the StartClipMask and EndClipMask accordingly to avoid changing the pixels outside the _original_ fill boundaries.
Here is the code that will align and set the masks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 | __m128i StartClipMask = _mm_set1_epi8(-1);
__m128i EndClipMask = _mm_set1_epi8(-1);
__m128i StartClipMasks[] =
{
_mm_slli_si128(StartClipMask, 0*4),
_mm_slli_si128(StartClipMask, 1*4),
_mm_slli_si128(StartClipMask, 2*4),
_mm_slli_si128(StartClipMask, 3*4),
};
__m128i EndClipMasks[] =
{
_mm_srli_si128(EndClipMask, 0*4),
_mm_srli_si128(EndClipMask, 3*4),
_mm_srli_si128(EndClipMask, 2*4),
_mm_srli_si128(EndClipMask, 1*4),
};
if(FillRect.MinX & 3)
{
StartClipMask = StartClipMasks[FillRect.MinX & 3];
FillRect.MinX = FillRect.MinX & ~3;
}
if(FillRect.MaxX & 3)
{
EndClipMask = EndClipMasks[FillRect.MaxX & 3];
FillRect.MaxX = (FillRect.MaxX & ~3) + 4;
}
|
Here is the code that will do the masking:
| __m128 U = _mm_add_ps(_mm_mul_ps(PixelPx, nXAxisx_4x), PynX);
__m128 V = _mm_add_ps(_mm_mul_ps(PixelPx, nYAxisx_4x), PynY);
__m128i WriteMask = _mm_castps_si128(_mm_and_ps(_mm_and_ps(_mm_cmpge_ps(U, Zero),
_mm_cmple_ps(U, One)),
_mm_and_ps(_mm_cmpge_ps(V, Zero),
_mm_cmple_ps(V, One))));
WriteMask = _mm_and_si128(WriteMask, ClipMask); // HERE
|
The question is why are we doing that?
We're already clipping to the texture boundaries using the U,V coordinates and masking accordingly. Right?!
Expanding the fill rectangle will cause the pixels to fall outside the texture and get masked automatically.
We have done that before. We used to iterate over the entire output buffer pixels and that just worked.
To eliminate the suspicions I modified the code slightly and added an assert:
| __m128 U = _mm_add_ps(_mm_mul_ps(PixelPx, nXAxisx_4x), PynX);
__m128 V = _mm_add_ps(_mm_mul_ps(PixelPx, nYAxisx_4x), PynY);
__m128i WriteMask = _mm_castps_si128(_mm_and_ps(_mm_and_ps(_mm_cmpge_ps(U, Zero),
_mm_cmple_ps(U, One)),
_mm_and_ps(_mm_cmpge_ps(V, Zero),
_mm_cmple_ps(V, One))));
__m128i WriteMask2 = _mm_and_si128(WriteMask, ClipMask);
Assert(memcmp(&WriteMask, &WriteMask2, sizeof(WriteMask)) == 0);
|
And everything is fine.
I checked later days source code to see if it has changed but it is still the same.
Am I missing something?