Classification by multi layer neural networks depends on the existence of appropriate features in the early hidden layers, so that the representations are linearly separable in the penultimate layer. By using hidden layers with just two or three units, the representational structure of the intermediate layers can be visualized. Time courses of the evolution of the hidden layer representations are visualized as animations. The visualizations reveal tendency for the hidden unit image of the input space to collapse into a non-linear (warped) manifold of lower dimensionality; i.e. the weight matrix is (nearly) singular. A task which is not linearly separable in the input space is rendered linearly separable by the warping of the manifold. Because of the matrix singularity, deeper layers of the network do not have sufficient information to reconstruct the input in its original form. Thus, the deep layers are incapable of discriminating certain distinct stimulus patterns.