action-featimg



Learning features with Sparse Auto-encoders

[Code-SparseAE] [Code-Linear Decoders]

Let's start by understanding an auto-encoder. An auto-encoder neural network is an unsupervised learning algorithm that applies back propagation, setting the target values to be equal to the inputs i.e. it uses $y^{(i)} = x^{(i)}$.

Here is an auto-encoder:

autoencoder

Figure 1: Auto-encoder


Here, $\hat{x} $ is the reconstruction of the input $x $. This identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data.

The argument above relied on the number of hidden units $s_{2}$ being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose sparsity constraint on the hidden units, then the auto-encoder will still discover interesting structure in the data, even if the number of hidden units is large.

After implementing the sparse auto-encoder algorithm, with activation function of Layer 2 (hidden) and Layer 3 (output) neurons as sigmoid function, on following B/W natural images,

sample1

Figure 2: Pre-processed B/W natural image - Sample 1

sample2

Figure 3: Pre-processed B/W natural image - Sample 2


We learn features as shown below.

naturalwts

Figure 4: Edge-like features learned over 8x8 random sample of images


Sparse auto-encoder try to learn features which are similar to Receptive Fields (RF) of V1 - the primary visual cortex of brain, or one may call it edge like features.

But when we try to learn features of some artificial images (the ones we don't find in nature), we obtain some different types of features. For example, if we try to learn features of handwritten digits as shown below,

digits

Figure 5: Handwritten digits sampled from MNIST dataset


We learn features which look like pen strokes, as shown below.

weightdigits

Figure 6: Pen strokes learned over handwritten digits


As it can be seen, these features make perfect sense. Brains do visualize such low level features, such as V1 RFs – the Oriented RF, being combination of LGN RFs - the Center-Surround Receptive Fields in the Retina.


compdiag

Figure 7: Visual system in Brain (Eye to LGN to V1)


By using sparse auto-encoder on these following color images, with activation function of Layer 2 (hidden) and Layer 3 (output) neurons as sigmoid function and linear function respectively,


colorimg

Figure 8: Color images sampled from STL-10 dataset


We obtain color features which do look like color edges.


colorfeat

Figure 9: Colored and B/W edge-like patch features learned over 8x8 random sample of images.


These edge-like and pen stroke-like features are much generalized and are consistent with natural images and artificial digit images, thereby leading to high classification accuracy. These edge-like and pen stroke-like features are low level features and high level features can be obtained by combining these low level features.




action-featimg