New preprint: The title is changing to: “Information-Theoretic Measures of Influence Based on Content Dynamics”. I’ll give a detailed, readable summary in a week or two. I’ll be presenting about this at WIN workshop, so please come!

For now, imagine the following problem. There are hundreds of people in a large room talking to each other. You can hear what everybody is saying, but you don’t know who’s talking to whom. How could you figure it out, based on the what they are saying? If one person says something, and another responds, we can usually tell that it fits with the original statement somehow. We quantify this intuition using information theory. If we interpret both people’s utterances as arbitrary signals, then we can use very general information theoretic tools to tell us if one signal is predictable from the other. I.e., one person’s statements are more predictable if we know what the other person is saying. 

Sadly, I was not born with a sensical one-word last name, leading to various problems throughout life. For googling purposes, the all-too-common misspellings:
GV Steeg
Greg Steeg
Greg VerSteeg

A busy month


Next month on April 9 I’ll give a talk for UC Irvine’s AI/ML seminar. The next week I’ll fly to Lyon, France for WWW (the World Wide Web conference). I’ll be giving a talk at a workshop before WWW called “Making sense of micro posts”. Then I’ll present my and Aram Galstyan‘s paper “Information Transfer in Social Media” at WWW.

In the meantime, I hope to finish a paper for UAI while continuing to teach “Physics and Computation”. After all that, perhaps a visit this summer to a research group in Singapore? More details on that later.

I’m teaching a graduate course this term at USC with Aram Galstyan called “Physics and Computation”. Basically, it shows how we can use statistical physics to understand problems in computer science. Hopefully I’ll post some details about the class later.  For now, here’s the main text we’re using: Information, computation, and physics.

In a classic two-birds-one-stone maneuver, we’re thinking about some projects that the students could do that relate to the big question: what can we do with a D-Wave chip? Both 2-SAT instances and spin glass (or LDPC) codes are exactly representable as Ising models that can be solved on the chip. Conveniently, these are both problems we’ve been studying in detail in the course: their phase transitions, average complexity, best classical algorithms, etc.  Here’s a toast to a hopefully synergistic term.

It’s official


Lockheed bought a D-Wave quantum chip for USC which is now being installed at ISI. Press release. I got to see the large refrigerated, magnetically shielded box yesterday. Now, what shall we do with it?

(Edit: sorry, link fixed.)

I finally have a working paper up called Information transfer in social media, related to the talk I gave at WIN. Read on for a quick explanation.

Neurons in the brain give sporadic electrical spikes. How does the pattern of neuronal spikes correspond to a thought? Or, how are our thoughts coded as electrical spikes? Researchers in neuroscience turned to information theory as the most general mathematical framework we have for answering this question. Basically, this allows them to quantify how much information is contained in a signal of spikes, and to correlate that information with different external stimuli (e.g. a picture of a cat).

We imagine that each person in a social network is a neuron and each tweet or post corresponds to a spike of activity. Can we use information theory to decode what’s going on? If you follow someone on a social network, do they really affect your actions? Intuitively, we know that the answer is sometimes “no”; many of our Facebook “friends” are nothing of the sort.

We want to uncover the “real” network of connections that make people tick. We can do this in a statistical way with enough data. Roughly, we measure how much our uncertainty about what you will do next is reduced if we know what the person you are following has done.

The results are surprising. We are able to deduce a lot about what’s going on just from the timing of tweets. One person’s weak influence on a million followers may amount to less than another person’s strong influence on a hundred thousand followers. Another interesting result is that the most predictable activity on Twitter comes from spammers. I’m including some of the high information transfer clusters below. Some are in the paper and some are exclusives.

(Flippant) commentary is in picture order. You can check out any of these accounts by going to, though the spammier accounts have been banned.

Soccer cluster: delwardhk really loves his soccer. I can’t be sure whether he personally retweets all the regional soccer accounts, or if it’s automated.

Drug cluster: The international viagra drug cartel has finally been revealed through science.

Bieber cluster (only a piece): I don’t think this is spam, these people are really obsessed with Justin Bieber.

Webcam cluster: I didn’t explore this in detail. Apparently these young ladies will chat with you over their webcams. That sounds sweet.

Spam-of-all-trades: What’s your game friend? Are you into marketing, travel, escorts, or pantyhose? How can you justify cross-posting to all these accounts?

Boogie cluster: These are night club promoters. Boogie Fonzarelli is the greatest name ever constructed. My firstborn will have to be named Boogie Fonzarelli Ver Steeg.

This slideshow requires JavaScript.

UAI 2011 has ended, and it went really well. I was happy with my talk and surprised at how many people at UAI were interested in quantum stuff!

There were many interesting presentations but I want to mention one in particular because it’s on my mind and it has a nice connection to my talk.

My talk was about detecting the existence of hidden variables. This is only possible if the effect of the hidden variable is constrained. One route, which I took, is to say that the hidden variable may be arbitrarily complex but it doesn’t change in time, nor do actors’ dependence on it change in time. But there was an interesting paper which took a different route:

“Detecting low-complexity unobserved causes” by Dominik Janzing, Eleni Sgouritsa, Oliver Stegle, Jonas Peters, Bernhard Schölkopf

In this case, the effect of the hidden variable is constrained by assuming a low complexity hidden variable. This constrains the possible effects on observed variables and therefore gives you a signature to deduce their existence. I will be reading this paper closely!


Get every new post delivered to your Inbox.

Join 71 other followers