SIMD: drawRectangle

I changed the drawRectangle function to use the Intel intrinsics, but when rectangles scale by a small margin, they don't scale smoothly. I guess this is because if the width % 4 != 0, up to 3 pixels aren't drawn.

Is there an elegant solution to this bug, or is it a compromise between speed and the flexibility of being able to render rectangles of any width?
What do you mean by "don't scale smoothly" ? DrawRectangle function never scales anything - it just fills rectangular area with same color value.

Handling non multiple of 4 case you can do in several ways:

1) do same as Casey did with WriteMask. You calculate what you can, then load what was already there in destination - mask both values, or them together, and then store.

2) separate loop in two parts - first part processes pixels in full 4x width. Second part handles tail with masking. Or three parts - first handle prefix with masking, then loop with full 4x stores, then suffix with masking.

Edited by Mārtiņš Možeiko on
My 'bug' was similar to what happened here in HH.