Since windows Vista, DirectSound isn't hardware accelerated, it's software emulated and goes through the kernel mixer. You can find some informations on
msdn and
wikipedia.
For what I remember, the difference between the write cursor and the play cursor is the minimum latency you can achieve and it's about 30ms. And the minimum change in the cursor position is 1/100s worth of samples.
And calling GetCurrentPositon later would not help, since you still need to write samples between the last write and what is expected to be the next write.