Clusters, Quantums, and Other Updates

21Oct13

Demystifying Information-Theoretic Clustering

Quick summary: Finding clusters in data is a fundamental problem in machine learning. You’d like to be able to do so without making any assumptions about your data. Information theory provides a good way to do this, but the first few attempts to do so have been fundamentally flawed. We fix the flaws and show that it leads to much better clusters. Almost every clustering paper shows a picture like the one on the left below. What they don’t show is that they have carefully picked the size of the circles so that their clustering method works. Our method (CVR) works for arbitrarily sized clusters.

Using information theory to find clusters is a nice idea, but previous attempts were wrong, and would only work for carefully hand-picked examples. Our method works for arbitrarily sized clusters.

Using information theory to find clusters is a nice idea, but previous attempts (NIC,MI) were totally wrong, and would only work for carefully hand-picked examples. Our method (CVR) works for arbitrarily sized clusters.

MAX 2-SAT with up to 108 qubits

Quick summary: If you read Shtetl-optimized, you’re aware of the ongoing drama surrounding the D-Wave quantum chip. There was a widely publicized paper showing some speed-ups in solving certain problems with the D-Wave chip. Those speed-ups evaporated when compared to state-of-the-art classical solvers. In the debate at Shtetl, Scott says:

In the comments, many people tried repeatedly to change the subject from [whether a speed-up has been demonstrated] to various subsidiary questions. For example: isn’t it possible that D-Wave’s current device will be found to provide a speedup on some other distribution of instances, besides the one that was tested?

That is basically what we were trying to explore in our paper. We have some knobs to tune the distribution of problems to see which ones are easier or harder. Can we reveal a difference in hardness for the classical and quantum solvers? The answer is a tentative yes, but there is more to be done here.

Other things

I went to the WIN workshop again this year. I talked about some recent ideas about detecting contagion. Although “latent homophily” has gotten a lot of discussion as a methodological problem, we showed that it can be overcome, while “external dynamics” provide a more fundamental (and frequently ignored) obstacle to detecting contagion.

Shuyang Gao gave an outstanding talk about “stylistic accommodation”. Have you ever noticed that when one person yawns, it makes other people yawn too? The question in this line of research is whether one person’s use of a particular style of speech affects the style of speech of people responding. Previous research had purported to find strong effects of this kind, but Shuyang showed that almost all of these correlations are due to confounding factors.

Finally, I’ll be in India in December, and I’m planning to visit the Tata Institute of Fundamental Research (in Mumbai) while I’m there.

Filed under: Posted by Greg Ver Steeg | Leave a Comment

No Responses Yet to “Clusters, Quantums, and Other Updates”

Feed for this Entry Trackback Address

Leave a Comment