Cache-friendly lists

cubercaleb
btaylor2401
= sum((x*x - 2*x*u + u*u)) / n
= (sum(x*x) - sum(2*x*u) + sum(u*u)) / n
Since subtraction is anticommutative I don't think that you can make that jump. But I am not entirely sure.

-2xu = -1*2xu
It's just pulling out a constant multiplier, which comes out in front of a summation. (Because ab+ac = a(b+c)). This is the same rule that lets me pull out the 2u on line 4.
cubercaleb
Since subtraction is anticommutative I don't think that you can make that jump. But I am not entirely sure.


You can view subtraction as being the composition of addition and multiplication by -1, i.e. 1 - 2 - 3 = 1 + (-1)*2 + (-1)*3 = 1 + (-1) * (2 + 3). So that step is valid.

Edit: Also wow we gotta add nested quotes haha

Edited by Andrew Chronister on
btaylor2401
It's just pulling out a constant multiplier, which comes out in front of a summation. (Because ab+ac = a(b+c)). This is the same rule that lets me pull out the 2u on line 4.
Huhh, that somehow works, Nice find. But...
btaylor2401
var = sum(x*x)/n - sum(x)^2
You need to divide by (n-1) in order for this to be the variance, as implied by the final stddev equation you wrote.
Anyways, while this is awesome my concern is that for floats this may not be as accurate, especially if the x values you are dealing with are farther away from 0 (which they are for timings). With the typical stddev formula you subtract each value by the mean, which brings each term closer to 0 before performing the square, this mitigating this problem to some extent.

Edited by Caleb on
ChronalDragon
Edit: Also wow we gotta add nested quotes haha
I think the ability to have multi quote and reply as well, plus being able to see the previous posts while writing your own replay would be nice. Having a "preview" button would also be nice to see how the formatting of the post looks before hand.
cubercaleb
You need to divide by (n-1) in order for this to be the variance, as implied by the final stddev equation you wrote.

The (n-1) is a correction for sample variance instead of population variance. Both versions are correct, they just refer to different things. (In practice, you basically never have the entire population to look at, so the first version is rarely used.)
Anyways, while this is awesome my concern is that for floats this may not be as accurate, especially if the x values you are dealing with are farther away from 0 (which they are for timings). With the typical stddev formula you subtract each value by the mean, which brings each term closer to 0 before performing the square, this mitigating this problem to some extent.
Hence my sample code using double instead of float. There are alternate methods of computing variance that are more numerically stable, but it's unlikely to matter in this usage. (53 bits of precision is a lot, to show a noticeable effect on accuracy you'd need a test run spanning *days*, if not weeks.)