Let's start by understanding an auto-encoder. An auto-encoder neural network is an unsupervised learning algorithm that applies back propagation, setting the target values to be equal to the inputs i.e. it uses $y^{(i)} = x^{(i)}$.
Here is an auto-encoder:
Figure 1: Auto-encoder |
Here, $\hat{x} $ is the reconstruction of the input $x $. This identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data.
The argument above relied on the number of hidden units $s_{2}$ being small. But even when the
number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting
structure, by imposing other constraints on the network. In particular, if we impose sparsity constraint on the hidden units,
then the auto-encoder will still discover interesting structure in the data, even if the number of hidden units is
large.
After implementing the sparse auto-encoder algorithm, with activation function of Layer 2 (hidden) and Layer 3 (output) neurons as sigmoid function, on following B/W natural images,
Figure 2: Pre-processed B/W natural image - Sample 1 |
Figure 3: Pre-processed B/W natural image - Sample 2 |
We learn features as shown below.
Figure 4: Edge-like features learned over 8x8 random sample of images |
Sparse auto-encoder try to learn features which are similar to Receptive Fields (RF) of V1 - the
primary visual cortex of brain, or one may call it edge like features.
But when we try to learn features of some artificial images (the ones we don't find in nature), we obtain some different types of features. For example, if we try to learn features of handwritten digits as shown below,
Figure 5: Handwritten digits sampled from MNIST dataset |
We learn features which look like pen strokes, as shown below.
Figure 6: Pen strokes learned over handwritten digits |
As it can be seen, these features make perfect sense. Brains do visualize such low level features,
such as V1 RFs – the Oriented RF, being combination of LGN RFs - the Center-Surround Receptive Fields in the
Retina.
Figure 7: Visual system in Brain (Eye to LGN to V1) |
By using sparse auto-encoder on these following color images, with activation function of Layer 2
(hidden) and Layer 3 (output) neurons as sigmoid function and linear function respectively,
Figure 8: Color images sampled from STL-10 dataset |
We obtain color features which do look like color edges.
Figure 9: Colored and B/W edge-like patch features learned over 8x8 random sample of images. |
These edge-like and pen stroke-like features are much generalized and are consistent with natural images and artificial digit images, thereby leading to high classification accuracy. These edge-like and pen stroke-like features are low level features and high level features can be obtained by combining these low level features.