The “information sieve” with bonus eigen-faces!


I just put up a new paper with the (hopefully) intriguing title “The Information Sieve“. The motivation is that when we humans look at the world, we tend to identify a new pattern or learn a new trick, and then we move on to the next thing. There are two amazing things about this:

1. We don’t learn everything at once.

2. We don’t re-learn the same thing over and over again (usually).

These may seem inconsequential, but it turns out to be very difficult to get machines to learn in this way.

The information sieve introduces a new way of learning things piece by piece. There is some amount of information in whatever data we are looking at, but we don’t know how much (and it’s usually impossible to exactly find out because of limited data/computation). We pass the data through the first layer of the sieve to extract the “most informative” pattern in the data, the data is transformed and the remaining information trickles down to the next layer of the sieve. This “remainder information” contains all the information from the original data except for what was already learned. This allows incremental learning that is guaranteed to improve at each step, and to never duplicate effort by re-learning what is already known.

The bonus eigen-faces below are not in the paper, but they show what the sieve extracts at various layers (when looking at a classic dataset called the Olivetti faces). Any face can “activate” any of these 10 learned factors. The blue/red shows how different pixels in the image contribute to whether that factor is activated. One seems to correspond to faces looking left or right (bottom, second from right). Others seem to focus on different parts of the face reflecting facial expressions. Anyway, there is more to be done to make this method practical on larger datasets, but this seems to be a promising first step. (The paper also shows how this method applies to lossy and lossless compression and independent component analysis, in case that is your bailiwick.)

Some eigen-faces learned with the information sieve.

Some eigen-faces learned with the information sieve.

2 Responses to “The “information sieve” with bonus eigen-faces!”

  1. 1 Vlad

    I previously evaluated CorEx on some of my data. It was natural to read this new paper. I am little bit confused about the current implementation on GitHub. Can it be used as it is or some modifications are needed? If everything can be handled using the calls to the existing functions, that would be great.
    Thank you a lot for this method!

    • The version of CorEx on github (from this NIPS paper), should work well for this. I’m perpetually planning to release a newer version in development that has several features. It incorporates advances from this paper, including continuous data and overlapping tree structure, and it includes some Bayesian smoothing when learning marginal parameters. Since the sieve learns incrementally, the non-overlapping tree structure becomes irrelevant anyway. And the description of the sieve so far only is for discrete variables. Anyway, I’ll happily share development code to anyone who gets in touch with me.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: