Hey so I also found an algebraic argument for the way the normals transform. Do you have something for this too?
Lets say we have a vector P and its normal N. Then,

And lets say we apply a transformation T to the vector P. Then, we want to find the normal such that:

So suppose there is a transformation T1 that transforms N to Nprime. Then:

So if we want to keep the dot product 0 then its sufficient that

Therefore we can say that if we transform a vector by the matrix T then the normal should be transformed by T1 to keep the dot product 0 and hence the transformed normal perpendicular to the transformed vector. (Incidentally, I think this means we can tranform vectors at any angle while preserving the angle like this, not just normals).
Transformation is a rotation
Now, if the transformation T is a rotation matrix (R) then:


And for rotation matrices we know that:

So,

Which means the normal vector n will tranform by the same rotation matrix R as the vector p.
Transformation is scaling
For scaling we need to multiply with a diagonal matrix with the diagonal elements representing the scaling factors. So,

Now scaling matrices will be diagonal matrices like

and their inverse will be

and transpose of the diagonal matrix leaves it unchanged so that we have

General transformation
So for a general transformation involving both rotation and scaling given by the product of transformation matrices:

The normal must be transformed as

or


So we just invert all the diagonal elements of the scaling matrices and use the rotation matrices directly for transforming normals while keeping them normal.