The most basic example is something like
| size_t count;
int *A = //...;
int *B = //...;
for(int i = 0; i < count; i++){
A[i] += B[i]+constant;
}
|
each iteration through the loop does not affect what happens in the other iterations, order of memory accesses can be shifted around so the compiler could do a partial unroll and then transform the function into:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 | size_t count;
int *A = //...;
int *B = //...;
int i;
for(i = 0; i+3 < count; i+=4){
int A0 = A[i+0];
int A1 = A[i+1];
int A2 = A[i+2];
int A3 = A[i+3];
int B0 = B[i+0];
int B1 = B[i+1];
int B2 = B[i+2];
int B3 = B[i+3];
A0 += B0 + constant;
A1 += B1 + constant;
A2 += B2 + constant;
A3 += B3 + constant;
A[i+0] = A0;
A[i+1] = A1;
A[i+2] = A2;
A[i+3] = A3;
}
//do the last few
for(; i < count; i++){
A[i] += B[i] + constant;
}
|
The first for loop would actually be implemented using simd
You can do the same transformation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 | void *ft_memchr(const void *ptr, int value, int num)
{
int i;
char* cptr = (char*)ptr;
for(i = 0; i+3 < num; i+=4){
char val0 = cptr[i+0];
char val1 = cptr[i+1];
char val2 = cptr[i+2];
char val3 = cptr[i+3];
if(val0 == value)
return &cptr[i+0];
if(val1 == value)
return &cptr[i+1];
if(val2 == value)
return &cptr[i+2];
if(val3 == value)
return &cptr[i+3];
}
for(;i<num; i++){
if(cptr[i] == value)
return &cptr[i];
}
return (0);
}
|
There are ways to implement the cascaded if check with simd.
Doing manual unroll gets pretty verbose once you do go really wide and the surface area for copy-paste-edit errors gets larger. So it is really important to ensure that you check for correctness and make sure it actually helps performance.