I'm still thinking my way through this stuff as well, so you have been forewarned...
There is a 
technical memo that discusses this and derives the same formula that Casey talked about.  In that memo he talks about the "over" operation (Source is drawn over Destination) not being "closed" for the non-premultiplied alpha formulation which is a fancy way of saying  non-premultiplied colors go in but premultipled alpha colors come out.  So, why is that?
My current mental model is that the color that is ultimately drawn on the screen is, in fact, a premultiplied alpha color.  The pixels that get drawn on the screen are just ( R, G, B ) in the end - the alpha is just used to figure those final values.  So what should the final pixel value be for ( R, G, B, A ) drawn over black ( 0, 0, 0, 1 )?  There are various ways to interpret what alpha means but if we think of it as either opacity or fraction of the pixel occupied to be drawn, the drawn pixel would be ( A*R, A*G, A*B ), which is the premultiplied alpha color and the output of non-premultiplied alpha formula (i.e. non-premultiplied colors go in but premultipilied colors come out and, ultimate, get drawn on the screen).
Anyway, that's the best I've got at the moment.  I'm going to ramble a bit more in the hopes that what I'm going to say is helpful or generates a discussion that is helpful.
One interpretation of alpha is that it is the fraction of a pixel filled by a surface.  So, if we randomly sample a point in a pixel the probability that we hit the source surface is SA and probability that we miss the source surface is 1 - SA.  So, thinking about the Source over Destination case again, what is the alpha of the resulting pixel given that we started with DA (destination alpha) and we "painted" a surface over it with SA (source alpha)?
In below, read "Probability" to mean "Probability a random point sample of the pixel":
|  | SA = PS = Probability hits the Source
DA = PD = Probability hits the Destination (before Source applied over)
A = P = Probability hits either Source or Destination after Source has been applied over
P = PS + (1 - PS)*PD
or
A = SA + (1 - SA)*DA
 | 
Which is the formula Casey derived.  In this interpretation, alpha is the probability that a random point sample would hit either the Source or Destination.  We can't simply add SA and DA since that would double count the case where a point sample would hit a Source point over a Destination point.
To find the color of the pixel we just apply the probabilities:
or, using premultiplied alphas,