Decoding the Twitter brain

13Oct11

I finally have a working paper up called Information transfer in social media, related to the talk I gave at WIN. Read on for a quick explanation.

Neurons in the brain give sporadic electrical spikes. How does the pattern of neuronal spikes correspond to a thought? Or, how are our thoughts coded as electrical spikes? Researchers in neuroscience turned to information theory as the most general mathematical framework we have for answering this question. Basically, this allows them to quantify how much information is contained in a signal of spikes, and to correlate that information with different external stimuli (e.g. a picture of a cat).

We imagine that each person in a social network is a neuron and each tweet or post corresponds to a spike of activity. Can we use information theory to decode what’s going on? If you follow someone on a social network, do they really affect your actions? Intuitively, we know that the answer is sometimes “no”; many of our Facebook “friends” are nothing of the sort.

We want to uncover the “real” network of connections that make people tick. We can do this in a statistical way with enough data. Roughly, we measure how much our uncertainty about what you will do next is reduced if we know what the person you are following has done.

The results are surprising. We are able to deduce a lot about what’s going on just from the timing of tweets. One person’s weak influence on a million followers may amount to less than another person’s strong influence on a hundred thousand followers. Another interesting result is that the most predictable activity on Twitter comes from spammers. I’m including some of the high information transfer clusters below. Some are in the paper and some are apparenthorizons.com exclusives.

(Flippant) commentary is in picture order. You can check out any of these accounts by going to twitter.com/username, though the spammier accounts have been banned.

Soccer cluster: delwardhk really loves his soccer. I can’t be sure whether he personally retweets all the regional soccer accounts, or if it’s automated.

Drug cluster: The international viagra drug cartel has finally been revealed through science.

Bieber cluster (only a piece): I don’t think this is spam, these people are really obsessed with Justin Bieber.

Webcam cluster: I didn’t explore this in detail. Apparently these young ladies will chat with you over their webcams. That sounds sweet.

Spam-of-all-trades: What’s your game friend? Are you into marketing, travel, escorts, or pantyhose? How can you justify cross-posting to all these accounts?

Boogie cluster: These are night club promoters. Boogie Fonzarelli is the greatest name ever constructed. My firstborn will have to be named Boogie Fonzarelli Ver Steeg.

This slideshow requires JavaScript.

Filed under: Posted by Greg Ver Steeg | 2 Comments

2 Responses to “Decoding the Twitter brain”

Feed for this Entry Trackback Address

1 z247 on October 13, 2011 said:

Greg, I’m not following the slides too well. I assume the arrows show the flow of influence, which in this case would be tweets or retweets on the same topic?

And are you then saying that more intimate relationships (defined as… more interaction vs. one-way communication? More tweets to fewer recipients?), are more likely to continue the conversation/retweet as measured by the amount of time between replies?

Or is this all explained in the actual paper, which I’m too lazy to read right now?

Reply
- 2 gv on October 13, 2011 said:
  
  You’re right that the arrows are meant to depict a flow of influence.
  We looked at all the existing (directed) edges between A->B (B is a follower of A). Then we calculate the (also directed) information transfer from A to B. This gives a measure of how predictable B’s activity is, if we know A’s recent activity. Or, to be more specific, after A tweets, how much more likely is B to tweet? Note that this is not always in the form of retweets. It could be a conversational reply, or reposting without a “RT” (as the spammers do).
  In these graphs I’ve kept only the edges with very high information transfer, but for most of these users I’m showing less than 1% of their actual links.
  
  I should also point out that our dataset was of all tweets containing URLs in about a two week period. This cuts out a lot of interesting stuff.
  
  Reply