Archive for the 'Media' Category

Black swans, and the problem with prediction markets

Thursday, July 12th, 2007

“Who knows what’s going to happen?/Lottery or car crash/Or you join a cult.” - Bjork, “Possibly Maybe”

I’m reading “The Black Swan”, the new book by Nassim Nicholas Taleb, whose “Fooled by Randomness” I read a few months ago and really liked. I didn’t think there was much more for him to say on the topic of uncertainty, but this book proves that wrong: in fact, there’s quite a bit more to say. Whereas the first one focused on human psychology and all the various ways we fool ourselves into thinking we can predict the future, this one takes a more mathematical tone, explaining why the future is inherently unpredictable. This is a very big statement: after all, maybe the only reason that we can’t predict the future very well is that each of us is cursed with our inherent biases, and limited information. If that were true, then if you could aggregate everyone’s thoughts, using, say, a prediction market, you’d have a good chance of getting at the truth.

Prediction markets: 2004-era buzzword, and the inspiration for my own Betocracy site. It’s far from a dead concept, with a site like Media Predict, launched two months ago, which is designed to help media companies figure out how well their movie, music and book properties will sell. And yes, Betocracy is still operational, though in all honesty I’ve lost interest in it; and so, apparently, has the world. (No need for sympathy, please! It was an important learning experience, I think.)

Anyway, the “holy grail”, to anyone who’s been interested in prediction markets, is James Surowiecki’s 2003 book “The Wisdom of Crowds”, the book which directly inspired me, and which I still have a high opinion of (though I may have to rethink some of my praise). Surowiecki captured many people’s imaginations with his examples of large groups making uncanny predictions. There was the first such demonstration, in which a crowd at an 1800’s fair guessed the weight of an enormous ox to within a pound or two. There are horse-race crowds, who collectively have odds-making abilities that are nearly unbeatable. And more recently, there are election-prediction markets, that have consistenly beaten the polls in predicting election results. So, to extrapolate, asks the book (and many people), why can’t we use prediction markets as an all-around forecasting tool? For movie grosses, say, or flu outbreaks, or terrorist strikes?

Taleb doesn’t directly talk about prediction markets, though he does talk about capital markets, which are just a more established version of the same thing. But his logic can be easily applied. All of these things have something in common: the weight of an ox (okay, that’s really an observation and not a prediction, but you could phrase it as some sort of prediction), sporting events, political elections. Taleb says that they all fall within the world of what he calls “Mediocristan”, which is not a comment on their quality but rather on the nature of their probabilities. If you plotted the possible outcomes of any of these, they’d all end up in a nice bell curve graph, where, once you get outside of a rather narrow range of possibilities near the center, the probability of an outcome declines dramatically. The chance of a U.S. presidential candidate winning anything more than 65% of the vote, for instance, is rather small; more than 80%, nearly impossible. Similarly, if you ran the same set of horses against one another over and over, the times for each horse would be fairly similar from one run to the next - for a horse to suddenly double or halve its usual racing speed is unheard of.

Most of real life, on the other hand, according to Taleb, takes place in what he calls “Extremistan”. There, there’s no nice trailing-off around the center. Things like personal income, product success, and the severity of wars all fall into this category. For every person who makes a certain amount of money, for instance, there’s a very real chance that someone else will be making twice that much, and someone else ten times as much, regardless of what that original number was. Things that happen in Extremistan are much more unpredictable for just that reason. That’s why the prediction market Hollywood Stock Exchange, which gets headlines for predicting Oscar winners, fails spectacularly when it comes to guessing box office revenues (and there’s a link I wish I had read before starting Betocracy; though who knows if it would have had any effect on me at the time.)

There’s a mathematical explanation for the difference between the two “worlds” of Mediocristan and Extremistan, and it has to do with conditional vs. independent probabilities. In the Mediocristan world of sports, elections, etc., all the factors going into the final outcome are fairly independent of one another: the number of points a team scores in the first half of a game doesn’t really affect the number of points they score in the second; whether a person votes for a certain candidate doesn’t affect whether their neighbor will vote for that candidate. Thus, for a result to be significantly different from expectations, many things would have to go right (or wrong) independently - enough to make such a result all but impossible. On the other hand, in Extremistan, every event affects every subsequent event. If a book sells a million copies, bookstores begin displaying it prominently; the author gets invited on talk shows to plug it, etc: selling the next million becomes a much easier proposition. Similarly with the price of a stock, or the success of a website, or really most of the other interesting questions in life. On the negative side, events like wars can easily snowball as well. Taleb notes that before World War I, which is a classic case of a small event mushrooming completely out of control, stock markets in Europe were doing good business - no one had any inkling of the grand tragedy that was just about to befall them.

So there’s a mathematical basis for explaining why the systems that do so well in predicting certain outcomes will fail at all the rest. And why we’ll have to remain in the dark about the really important issues, like maybe the most pressing unknown of the day: whether Iran will “push the button”, to quote a contemporary Israeli song. And it goes without saying that, in retrospect, that might not even be the thing we need to worry about the most.

UPDATE: Sorry I was too harsh about the Hollywood Stock Exchange - “fails spectacularly” was sort of a spur-of-the-moment phrase on my part, and probably unwarranted.

UPDATE 2: Oh, damn, Taleb linked to this post! I wouldn’t have predicted that.

Assignment Zero

Monday, June 25th, 2007

I got interviewed for NewAssignment.net’s Assignment Zero; the organization/website does crowdsourced journalism, meaning that they suggest topics and anyone who wants to can do the research and write the article about it. “Assignment Zero” is their first such “assignment”, a set of interviews with lots of people about, appropriately, crowdsourcing.

I was going to wait to link to it, since most of the interviews will eventually be published on Wired.com, but I think they’ve already been “published” now, on NewAssignment.net (hard to tell, but I think so). Here’s the interview with me, in which I share my thoughts about Discourse DB, the Semantic Forms extension and semantic wikis, plus musings about crowdsourcing and democracy.

I’m quite pleased to be interviewed among such a group of heavy-hitters: there are personal heroes of mine like Wikipedia co-founder and head Jimmy Wales and “The Wisdom of Crowds” author James Surowiecki, and heavy thinkers like Clay Shirky; even one of my college professors is there - Henry Jenkins (I took “Introduction to Media Studies”).

Many thanks to Nate Olson for conducting the interview.

Newspapers giving up free content?

Monday, May 7th, 2007

The editor of the Arkansas Democrat-Gazette critizes newspapers offering free content in an editorial in the Wall Street Journal (which happens, probably not coincidentally, to be the king of paid content):

The Inland Cost and Revenue Study shows that newspapers will generate between $500 and $900 in revenue per subscriber per year. But a newspaper’s Web site typically generates $5 to $10 per unique visitor per year. It may be that newspaper Web sites as an advertising medium, and free news, just can’t generate the revenue to sustain a valued news operation.

An interesting counterpoint to the usual “put all your information online for free, or risk looking like dinosaurs!” arguments. And those number differences are dramatic; if you need to gain 100 Web readers for every print subscriber you lose, that’s a big challenge.

Personally, I like the approach that a lot of small-to-medium newspapers seem to take, which is: to put all their articles and columns online for a week or two, after which it’s archived and only available to subscribers. That gives enough time for sites like Wikinews and, well, Discourse DB to summarize their contents, essentially those newspapers’ archiving work for them.

Then again, I don’t run a newspaper; it could be that demand for an article or column drops rapidly after the first few days that it runs, so archives wouldn’t get widely read anyway.

“I read the news today…”

Tuesday, January 9th, 2007

Check out Daylife, the new news-aggregation site - it’s pretty cool. At first glance it seems slow and pretentious - do they really need 3 tabs and 7 sub-tabs to show you the day’s news? Drudge Report manages fine with just one page, as does Google News with one page and a bunch of “next”s. The page is too weighed down with text and images - I can’t imagine clicking on an image that says “White House” just to read whatever top stories there are about the White House today.

So the site’s a bit overloaded. Also, it may or may not be named after the Beatles’ “A Day in the Life”. That could be just a wild guess.

Anyway, where Daylife really shines is in its search. Check out the Daylife search page for, say, Sarbanes-Oxley, the accounting act that’s still the most-read topic on Discourse DB. They have a whole range of articles and commentary from different newspapers and magazines (in theory they have blog posts too, but not for this issue), all of them relevant. On the right are photos of Chris Cox, the SEC commissioner, who’s the man most closely associated with the issue at the moment. Compare that with Google News’ search page on the same topic - there’s a lot of press releases, some really tangential articles, and obscure publications. The photos that appear are random and unrelated to anything.

For the sake of completeness, here’s Discourse DB’s Sarbanes-Oxley page - the easiest-to-navigate of the three, in my opinion, but then again this one’s not a search page so it’s not really a fair comparison. Just wanted to stick it in, to cleanse the palate a little.

I don’t know how Daylife manages to out-search Google, but they do. It’s a neat tool.

“Is that all there is?”

Wednesday, September 13th, 2006

The project I’m working on is not quite ready yet, so let me stall for time by talking about a concept I’ve been thinking about recently. This may have already been covered by other people before, but: there’s one nice feature that some blogs and many news sources have that makes them valuable; for want of a better word, let me call it “comprehensiveness”. It’s the idea that, in addition to all the current information that you can read or watch in that source (newspaper, news channel, blog, or otherwise), there’s one more piece of information that you’re getting: that there’s no bigger piece of news out there than what you see in front of you. If you read any major daily newspaper, there’s an implicit guarantee that there was no bigger piece of news from yesterday than what’s at the top of the front page. If you read a news aggregator site like Drudge Report, you know that no bigger news has happened than what’s at the top of the page, as of at least a few hours ago; for Google News or Wikinews, it’s the same thing but more like 15 minutes. And it’s a concept that hold up not just for general news sources or sites: if you’re interested in the goings-on of radical Islam, if you go to Little Green Footballs you have a pretty good guarantee that nothing bigger than what you see there happened in the world of radical Islam during the last few days. If you want to know about the excesses of the Republican party, you can go to Crooks and Liars and have a good amount of confidence that no greater outrage has occurred recently. And so on, for sports blogs, celebrity gossip blogs, tech blogs, etc. etc.
This is a very important quality for a site or other source to have, because people want to able to feel that they’re not missing something major. I think it’s a big part of what makes certain sites so successful. I know I’ll go to Drudge Report or Google News a few times during the day, if I can, for that very reason; if I see at the top something about Michael Jackson or whatever, I know that that there hasn’t been a major bombing, plane crash or assassination anywhere. That, by itself, might be the most important piece of news on that page.