A busy month


Next month on April 9 I’ll give a talk for UC Irvine’s AI/ML seminar. The next week I’ll fly to Lyon, France for WWW (the World Wide Web conference). I’ll be giving a talk at a workshop before WWW called “Making sense of micro posts”. Then I’ll present my and Aram Galstyan‘s paper “Information Transfer in Social Media” at WWW.

In the meantime, I hope to finish a paper for UAI while continuing to teach “Physics and Computation”. After all that, perhaps a visit this summer to a research group in Singapore? More details on that later.

I’m teaching a graduate course this term at USC with Aram Galstyan called “Physics and Computation”. Basically, it shows how we can use statistical physics to understand problems in computer science. Hopefully I’ll post some details about the class later.  For now, here’s the main text we’re using: Information, computation, and physics.

In a classic two-birds-one-stone maneuver, we’re thinking about some projects that the students could do that relate to the big question: what can we do with a D-Wave chip? Both 2-SAT instances and spin glass (or LDPC) codes are exactly representable as Ising models that can be solved on the chip. Conveniently, these are both problems we’ve been studying in detail in the course: their phase transitions, average complexity, best classical algorithms, etc.  Here’s a toast to a hopefully synergistic term.

It’s official


Lockheed bought a D-Wave quantum chip for USC which is now being installed at ISI. Press release. I got to see the large refrigerated, magnetically shielded box yesterday. Now, what shall we do with it?

(Edit: sorry, link fixed.)

I finally have a working paper up called Information transfer in social media, related to the talk I gave at WIN. Read on for a quick explanation.

Neurons in the brain give sporadic electrical spikes. How does the pattern of neuronal spikes correspond to a thought? Or, how are our thoughts coded as electrical spikes? Researchers in neuroscience turned to information theory as the most general mathematical framework we have for answering this question. Basically, this allows them to quantify how much information is contained in a signal of spikes, and to correlate that information with different external stimuli (e.g. a picture of a cat).

We imagine that each person in a social network is a neuron and each tweet or post corresponds to a spike of activity. Can we use information theory to decode what’s going on? If you follow someone on a social network, do they really affect your actions? Intuitively, we know that the answer is sometimes “no”; many of our Facebook “friends” are nothing of the sort.

We want to uncover the “real” network of connections that make people tick. We can do this in a statistical way with enough data. Roughly, we measure how much our uncertainty about what you will do next is reduced if we know what the person you are following has done.

The results are surprising. We are able to deduce a lot about what’s going on just from the timing of tweets. One person’s weak influence on a million followers may amount to less than another person’s strong influence on a hundred thousand followers. Another interesting result is that the most predictable activity on Twitter comes from spammers. I’m including some of the high information transfer clusters below. Some are in the paper and some are apparenthorizons.com exclusives.

(Flippant) commentary is in picture order. You can check out any of these accounts by going to twitter.com/username, though the spammier accounts have been banned.

Soccer cluster: delwardhk really loves his soccer. I can’t be sure whether he personally retweets all the regional soccer accounts, or if it’s automated.

Drug cluster: The international viagra drug cartel has finally been revealed through science.

Bieber cluster (only a piece): I don’t think this is spam, these people are really obsessed with Justin Bieber.

Webcam cluster: I didn’t explore this in detail. Apparently these young ladies will chat with you over their webcams. That sounds sweet.

Spam-of-all-trades: What’s your game friend? Are you into marketing, travel, escorts, or pantyhose? How can you justify cross-posting to all these accounts?

Boogie cluster: These are night club promoters. Boogie Fonzarelli is the greatest name ever constructed. My firstborn will have to be named Boogie Fonzarelli Ver Steeg.

This slideshow requires JavaScript.

UAI 2011 has ended, and it went really well. I was happy with my talk and surprised at how many people at UAI were interested in quantum stuff!

There were many interesting presentations but I want to mention one in particular because it’s on my mind and it has a nice connection to my talk.

My talk was about detecting the existence of hidden variables. This is only possible if the effect of the hidden variable is constrained. One route, which I took, is to say that the hidden variable may be arbitrarily complex but it doesn’t change in time, nor do actors’ dependence on it change in time. But there was an interesting paper which took a different route:

“Detecting low-complexity unobserved causes” by Dominik Janzing, Eleni Sgouritsa, Oliver Stegle, Jonas Peters, Bernhard Schölkopf

In this case, the effect of the hidden variable is constrained by assuming a low complexity hidden variable. This constrains the possible effects on observed variables and therefore gives you a signature to deduce their existence. I will be reading this paper closely!

I’m happy to unveil a new paper, “A sequence of relaxations constraining hidden variable models”.

Depending on your interests, I’m including two different overviews. One comes from the social networks perspective and the other from the quantum physics perspective. Fundamentally, they are both about detecting hidden variables.

I’ll be giving a plenary talk about this paper at UAI 2011. So if you like hidden variables and you like Barcelona, come see me there!

[Update: Best paper runner-up!]

Quantum perspective

When people like Einstein realized that quantum physics predicts that the outcomes of some experiments are uncertain, he, logically, concluded that the theory is incomplete. There must be some hidden variable that would allow you to predict the outcome perfectly. “God does not play dice.”

On the other hand, Bell’s theorem provides a remarkable retort to this view. Bell showed that there is a limit to how strongly correlated two particles can be if the following assumptions are true:

1) Free will. (We are able to freely choose between two incompatible measurements.)

2) No faster than light signaling. (My choice of measurement on one particle is not instantly transmitted to the other particle.)

3) Hidden variables. (This hidden variable would perfectly predict the outcome of my experiment, regardless of which distant measurements were chosen.)

These assumptions limit how strongly correlated two particles can be. “Entangled” particles in quantum physics go beyond this limit and therefore violate at least one of these assumptions. Einstein called this “spooky action at a distance”.

One amazing thing about Bell’s test is that it made no assumption on what kind of “local hidden variables” were allowed. Despite this, he was able to find a test that conclusively rules them out as an explanation. My paper is, essentially, about how we can easily find tests for hidden variables in other contexts. (The answer: finding the best test can be hard, but we can find good tests using semidefinite relaxations.)

Social networks perspective

You may remember a recent study in the news that said “obesity is contagious“. You may have wondered if it is really possible to show such a thing. Cosma Shalizi wondered the same thing (paper here)and showed that the answer is, generally, no. You can always find an equally good explanation without invoking contagion or influence, that attributes correlations to some hidden variable. For instance, there is some hidden attribute of Alice and Bob that causes them to become friends, and also predisposes them to becoming obese.

How much correlation on a social network can be attributed to these “hidden variable theories”? If these properties are true:

1) No influence

2) Hidden variables (Each person’s actions depend only on their previous actions and the hidden variable.)

3) Stationarity (The hidden variable does not change with time.)

Then we can derive a limit to how correlated two people’s actions can be. Violation of this limit implies either that some influence is involved, or that the hidden variable is changing in time.

Intuitively, imagine I flip a coin and I start calling out the results: “Heads, Heads, Tails, Heads…”. Now, my friend Ditto starts calling out shortly thereafter: “Heads, Heads, Tails, Heads…”. At some point we can be fairly convinced I’m influencing Ditto. The paper just quantifies when we can do this and how confident we can be!

My ISI web page got moved to a new framework, so it could get integrated with our group web page more easily.

After several hours of work, the result is marginally better looking than my old web page.

Coming soon…


I’m giving a talk at USC on May 20 for the quantum information and condensed matter physics seminar. It will be about phase transitions in the graph partitioning problem along with wild speculation about the implications for adiabatic quantum computation.

I’m also going to the International Conference on Complex Systems this summer, talking about the partitioning problem but without the wild speculation.

I should hear back from UAI soon about my paper on using semidefinite relaxations to create statistical tests for hidden variables. After I hear back, I’ll post the paper on arxiv along with an explanation here.

That conference (UAI) is in Barcelona this summer. I already mentioned I’ll be at ICWSM, and also at another workshop about the “Future of Social Web”.

It will be a busy summer!

The last Big Bang Theory episode was about information diffusion in social networks. The science consultant for the show asked my colleague Kristina Lerman to write about the topic for the Big Bang Theory blog. She mentions our recent paper (previous post). Unfortunately, although she gave the science consultant some of our graphs to put in the show, none of them were in the final cut.  I never thought I’d do research of interest to popular culture!

I have a new revision available on arxiv of the paper What stops social epidemics? I’ll be presenting the paper at ICWSM 2011 in Barcelona.

How likely is it for a virus or piece of information to spread through a network? If, on average, each person spreads the virus to more than one person than the virus will propagate through the entire network, an epidemic. If, on the other hand, each person spreads it to less than one person, the outbreak will die out very quickly, most likely after a handful of people have seen it.

For social epidemics on the Digg network, neither of these things happened. Stories would spread to a few hundred friends in the network, then stop without reaching even 0.1% of the whole network. There are many reasons for the slow-down of epidemics, but we identified two crucial, complementary ingredients.

  • Most people who are exposed to a story are exposed by multiple friends
  • More friends exposing you to a story does not make you more likely to spread the story yourself

Why does this slow down the epidemic? The first (primary) story spreader exposes some number of new people. Some fraction of those people are interested and spread the story themselves. If the people following these new (secondary) spreaders were a new, totally random group of people that looked just like the one the first spreader had, the story would merrily keep spreading. This kind of assumption is, roughly, a “mean field approximation”. But that’s not what happens. Instead, many of the friends of the secondary spreaders were already exposed by the primary spreader. Those people didn’t spread the story after one exposure, and they will continue to ignore it now. Fewer new people are reached in each round of spreading until the story dies out with a whimper.