In day 372 Casey had some negative things to say about VAOs. However, I don't think he was entirely correct, at least not according to AMD's OpenGL guy, Graham Sellers. He made a post about it
here. He posted some additional comments about it (and other OpenGL functionality)
here. The benchmarks he ran show multiple VAOs that aren't modified are always faster than a single global VAO that is modified.
Also, it seems as if Casey only has one vertex format anyway, so could push the enable arrays, and glVertexAttribPointer calls all the way up to the creation of the VAO, but he might be rendering something else that needs a different format.
Regards
elFarto