[Deep Learning] T-sne Visualization

T-sne is a dimensionality reduction technique based on clustering. It’s well suited for embedding high-dimensional data, thus useful to visualize high-dim feature vectors output from deep neural networks. (Similar to PCA but more robust)

Usually we reduce the dimension to 2 for the sake of visualization in 2D space. And a common way to visualize the clustering of high-dim vectors is to create a 2D grid and use the calculated (x,y) as coordinate to position the original image. An example is shown below, the data is CIFAR10 and the features are CNN feature vectors:


tsne embedding on CIFAR10 CNN features

And a zoomed in version of a corner; the dataset is pretty well-clustered:


zoomed in

Based on the tsne embedding, you are able to evaluate your trained network, whether the learned features represent the  images in the correct way as you want. Also, you are able to tell the mis-classified data. But since this is a low dimension representation, the distance shown here doesn’t necessarily reflects the real distance between clusters.

A well-written code in MATLAB is kindly provided by Alex Karpathy in his tSNE JS demo page:  Tsne JS demo

Happy embedding!