One thing you might want to try is to use a density gradient estimate at the location of the particle, rather than just at the cel in which the particle lives.
To understand what's going on in your code, consider this code snippet:
 Dispersion += Dc*(CelCenter>Density  CelLeft>Density)*V3(1.0f, 0.0f, 0.0f);
Dispersion += Dc*(CelCenter>Density  CelRight>Density)*V3(1.0f, 0.0f, 0.0f);

This is the same as:
 Dispersion += Dc*CelCenter>Density*V3(1.0f, 0.0f, 0.0f)
Dispersion = Dc*CelLeft>Density*V3(1.0f, 0.0f, 0.0f);
Dispersion += Dc*CelCenter>Density*V3(1.0f, 0.0f, 0.0f);
Dispersion = Dc* CelRight>Density)*V3(1.0f, 0.0f, 0.0f);

Two of the terms cancel to give:
 Dispersion += Dc*(CelLeft>Density  CelRight>Density)*V3(1.0f, 0.0f, 0.0f);

A similar cancellation happens in the y direction. The upshot of this is that the density at CelCenter is never actually used. Only four density samples are being used, not five, and the one that's ignored is the one that's closest to the particle!
I can't remember how HMH did this, and don't have time to look it up right now, but suppose we record densities this way:
 for (unsigned i = 0; i < NumParticles; ++i) {
int X = (int)Particles[i].x;
int Y = (int)Particles[i].y;
++&GameState>ParticleCels[Y][X];
}

(I know there's a scale and offset applied to the x and y coordinates; I'm ignoring that for simplicity.)
The cel at integer coordinates (X,Y) records all particles whose x coordinate is in the range [X,X+1) and y coordinate is in the range [Y,Y+1). So you can think of the cel as a square centred at (X+0.5,Y+0.5). In a sense, this point is where the density is defined.
So in the same way that we did texture mapping, you can calculate the density gradient at a point using bilinear interpolation on the four surrounding samples:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19  // WARNING: Untested code follows!
int X = (int)(Particle>x  0.5f);
int Y = (int)(Particle>y  0.5f);
float dx = Particle>x  0.5f  X;
float dy = Particle>y  0.5f  Y;
particle_cel *CelLeftDown = &GameState>ParticleCels[Y][X];
particle_cel *CelRightDown = &GameState>ParticleCels[Y][X+1];
particle_cel *CelLeftUp = &GameState>ParticleCels[Y+1][X];
particle_cel *CelRightUp = &GameState>ParticleCels[Y+1][X+1];
float dDdx0 = CellRightDown>Density * dx + CellLeftDown>Density * (1.0f  dx);
float dDdx1 = CellRightUp>Density * dx + CellLeftUp>Density * (1.0f  dx);
float dDdx = dDdx0 * (1.0f  dy) + dDdx1 * dy;
float dDdy0 = CellLeftUp>Density * dy + CellLeftDown>Density * (1.0f  dy);
float dDdy1 = CellRightUp>Density * dy + CellRightDown>Density * (1.0f  dy);
float dDdy = dDdy0 * (1.0f  dx) + dDdy1 * dx;
v3 Dispersion = Dc * (dDdx * v3(1,0,0) + dDdy * v3(0,1,0);

So you're using the same number of density samples (four), but you're actually using the four that are closest to the particle this time. This should avoid numeric smoothing.
If you're curious and want to get into the vector calculus a bit more, I recommend a tutorial on the MAC grid method.
This one looks pretty good.