I referred before to Discourse DB, a site I helped create, as “the first true wiki database site”, defining a wiki database as a set of data that is editable by the entire world but functions like a database. Well, there was certainly room to question that statement, since there are other, pre-existing, sites that combine wiki and database functionality in various ways. ITerating, a product-review site, and WikiTree, a genealogy site, are two examples, not to mention all the other sites that run on Semantic MediaWiki, the technology that Discourse DB itself is built on top of.
Well, now I’m on somewhat firmer ground with my statement, with the creation of the Discourse DB analysis page. This page uses data from the site that was obtained via Discourse DB’s data export, written in a format called RDF, using an RDF-specific query language called SPARQL. What does this mean? It means that anyone in the world can query Discourse DB to get its set of data. Even though the page is on the discoursedb.org domain, it’s going through the publicly-available interface to get the data, and in fact the querying to create this page was done on another server. And SPARQL is an open standard, so there’s nothing proprietary about the process.
If you check out the page you’ll also find some interesting information. Besides the basic type of information, like the political topics that appear most frequently, I programmed the script to get more in-depth information (the entire page was generated by a script). You can find out, for example, that:
- the single most-popular opinion for a column or editorial in Discourse DB to espouse is that coalition troops should not pull out of Iraq; the site 36 editorials or columns arguing that view. That’s followed closely by the opinion that the “Military Commissions Act of 2006″, the act on the treatment of enemy combatants that was passed by congress, should not have been passed; that’s an interesting matchup that suggests that there’s a divergence between what matters most to the commentariat on the left and on the right.
- the most controversial positions, meaning those with the closest split between authors arguing for and against them, are whether the United States should negotiate with Syria in order to improve the situation in Iraq, and whether the U.S. should build a fence along the Mexican border.
- the least controversial position is that China should put pressure on North Korea to end its nuclear ambitions: 21 editorials or columns have been written arguing that, and none against or even mixed on the issue.
- the two “authors” who have agreed on the most issues are The Wall Street Journal editorial board and The Washington Times editorial board, with 9 opinions in common. The individual authors who have agreed on the most issues are neoconservative writers William Kristol and Robert Kagan (not very interesting, since most of those columns were jointly-written).
- the two “authors” who have disagreed on the most issues are, maybe not surprisingly, The New York Times editorial board and The Wall Street Journal editorial board. The individual authors who have disagreed on the most issues are right-wing Charles Krauthammer and left-wing Chicago Tribune columnist Steve Chapman.
Now, none of this is entirely scientific; I’m not planning to try to get these results published in a public policy journal. The biggest issue is the spottiness of the information; the site is built to be able to hold opinion columns and such from any time in the past, but in reality there isn’t much from before three months or so ago. So while I can’t really vouch for the amount of truth contained in the data, I think it’s a good proof-of-concept of wiki-database querying and maybe semantic web querying in general.
1f52