Archive for the 'Semantic MediaWiki' Category

Yaron’s European Vacation

Thursday, September 30th, 2010

My wife and I got back a little over a week ago from a nearly three-week vacation through Europe, covering both the South and the North. We took a lot of photos, but sadly I don’t have any of them online, and I wanted to get this post out relatively quickly, so you’ll have to paint pictures in your head using my words (or, you know, do a Google Images search).

The first part of the trip was a week-long yacht trip in Italy, starting in Rome, that’s not quite as luxurious as the word “yacht” might imply but was still pretty cool. It was all pre-arranged, with five passengers on board (the two of us, and friends of ours), two crew and one captain; though some passengers got involved in manning the boat by the end (not me; I was mostly content with sleeping on the deck, punctuated by watching the waves). We ate well on that trip - a lot of really good pasta, and all manner of cheese, and some nice pizza. Being a vegetarian, I missed out on all the interesting meats, like sopressata and wild boar, but you can’t really get enough fresh pasta, I say.

Our boat stopped a few times for swimming breaks in the Mediterranean Sea, which was great for me because I never quite got my “sea legs”, so being out in the actual water was a bit of a relief every time. Our ship had a tendency to attract swarms of jellyfish, or maybe they’re just everywhere in the ocean. I got stung by one, which didn’t hurt any worse than, say, a mosquito bite, but then somehow after that I got a reputation among the people on the boat for being fearless about swimming near them. What can I say; I didn’t see them as a threat - like me, they were in Italy in search of a good meal.

The Italy trip started and ended in Rome, and we got to see a lot of small towns and islands around there that I otherwise probably never would have seen. Mostly we were in places around Tuscany: Siena, Isola del Giglio, Massa Marittima, and some other even smaller towns whose names I can’t remember. We also saw the island of Elba, which I assumed would be desolate because that’s where Napoleon was banished to (for just a year, it turns out), but evidently there’s been some progress in the last 200 years, because it’s now a very touristy resort town, looking like Cannes or Acapulco or some such. I don’t know what it takes to get banished to Elba these days, but it’s worth looking into.

Then it was on to a few days in Madrid, where we stayed at the Room Mate Mario hotel, probably the “chillest” hotel I’ve ever stayed at: bossa nova soundtrack in the lobby, bright plastic decorations in the rooms. I’d stay there again any time.

We saw a flamenco show at the Cardamomo, which I guess has become popular among Americans since getting written about last year in the New York Times. As the NYT noted, it’s “unadorned flamenco” - no ruffles, no hats, just good music and tortured expressions. I thought it was great. We also drove around the city via the Go Cars, which are basically a motorized scooter that shouts out descriptions of the places you’re passing by, and directions to the next stop (it uses GPS to know where you are). Pretty neat, although it took a little while to get used to being pointed at by the locals. And Spanish food, even for a vegetarian, is great - it’s a country with a deep respect for the fried potato and the olive, two foods I really like.

The Netherlands capped off the trip. It may be just me, but two weeks of traveling as a tourist is about the maximum for me before I start to feel a little restless, after having moved along from place to place in search of food and entertainment. But the Netherlands provided a clean break from that, since there was some actual work to be done. First there were three days in the charming college town of Wageningen (the Dutch ‘g’ is pronounced like a guttural ‘ch’, as I found out), where I helped out some people at KeyGene with their Semantic MediaWiki installation. I had known for a while that Semantic MediaWiki has found a nice niche for itself in the biological sciences, which experience a relentless flow of new data, new terminology, new interconnections, all of which semantic wikis are well-suited for; but it was great to finally see, in detail, how that was applied to a very specific usage: in this case, genomics research on different plants. It was really eye-opening and felt great to see; like visiting the British Royal Astronomical Society in the 1700s while they were discovering various stars and planets.

Then it was on to Amsterdam, where we stayed with an old friend of mine from New York who had helpfully moved to Amsterdam about four years ago. There I attended the Semantic MediaWiki Conference (SMWCon), which I plan to write a separate post about, on the WikiWorks blog. We also sampled some of the Amsterdam local culture, both with my friend and with some of the Semantic MediaWiki people, which was great - Amsterdam’s just a cool place to hang out in, undeniably.

And after almost three weeks, we ended up back in New York, tired yet refreshed, and me with about 80 items on my to-do list. Well, now I can check off one more.

SMW Camp thoughts

Wednesday, December 2nd, 2009

It’s been a hectic last month; we got back from our honeymoon last week, and I’m about to go travelling again, and I don’t know when or if I’ll have time to talk about all of it (I definitely hope to); but I did want to write about the amazing weekend we had of the SMW Camp we had in Karlsruhe, Germany. The conference was sponsored by Ontoprise, and took place in and around the Ontoprise office; mostly in a beautiful, loft-like glass-walled meeting room (here’s a representative photo, and here’s one of me talking). In the introduction, Daniel Hansch from Ontoprise, who served as the main host of the event, described Karlsruhe as “the capital of the Semantic Web”, which is only somewhat of an exaggeration.

About 40 people (that’s my guess) showed up, mostly from Germany but also from the Netherlands, Belgium and the U.S. The attendees came from what I consider the four branches of the Semantic MediaWiki world: academics, business people, hackers, and the smallest branch, government people (represented here by someone from the German Air Force). And the talks represented an equally broad spectrum, covering subjects from software tutorials to use cases of SMW to business methodologies to marketing. It felt like a real conference, with two solid days of talks and in-depth discussions.

Some lessons I think could be learned from the whole thing:

  • There’s not much need for the really introductory stuff. We thought that a lot of the crowd might be newcomers to the SMW “community”, and a lot of them were, but almost everyone knew about and had used SMW to some extent. In the future, it might make sense to cut out the introductory tutorials altogether.
  • On that note - there’s no shortage of things to discuss. We had two full days of talks, from 10 AM to 6 PM (actually 7 PM on Sunday), and even then some talks were rushed; and there was still subject matter we wanted to get to but couldn’t, like discussions about interface. Two full days is easily achievable.
  • Semantic MediaWiki isn’t ready for a true “unconference” yet. The name “SMW Camp” was chosen in part because it was supposed to reflect an “unconference” type of event organization: discussion topics decided at the event itself, with multiple tracks so that people can stick to the topics they’re interested in. But it turns out that, in the Semantic MediaWiki world everyone’s pretty much interested in everything: people want to hear about academic research, ongoing development, performance issues, and corporate usage. We did have parallel tracks for one session, and even that led to some complaints from attendees that they couldn’t hear about topics that they were interested in. So it looks like for the foreseeable future we’ll stick with one track. There’s also the issue of a pre-defined schedule versus an open one decided on the same day. What seems to work for SMW is having participants put down the names of presentations they’d like to give, on the meeting’s wiki page, in the weeks beforehand; organizers can then construct a schedule from that just a day or two beforehand. It’s a semi-structured approach that somehow seems appropriate for an event that relates to semantic wikis.I still like the name “SMW Camp” for the meetings, by the way, because it seems more descriptive than any of the alternatives; and because it’s distinctive enough that it’s easy to tell which events are official components of the series, unlike with, say, the name “SMW Meeting”.
  • SMW Camp could probably become a bigger production. The “users meeting” we had in Boston last year was free; and I lobbied hard to make this one free as well - we compromised on 15 euros. Then I found out that many of the attendees were surprised that the price was so low, when conferences that they considered to be of comparable quality routinely cost anywhere from $100 to thousands of dollars (that’s the general rate; speakers sometimes attend for free, and students usually get a deep discount). It’s clear that many of the attendees, at least the non-student ones, would be willing to pay a higher amount. That by itself isn’t reason enough to raise the rate, fo course, but extra money could help in a few ways: catering of events (though going out to restaurants was nice); free swag, like stickers, t-shrits and pens, that some people talked about having; and maybe even some extra money for the hosts, the organizers, people travelling from long distances, SMW developers… there’s no shortage of people one could give extra money to if one had it. :)Tied in with that is the idea of getting corporate or university sponsorship of the event, which, now that we have a proven track record, might be easier to do.

One interesting related thought is that I was trying right after the event to recall how this whole idea of SMW users meetings came to be, when I realized that the person indirectly responsible for them is Sergey Chernyshev; which is ironic because he has yet to attend one. But during the winter of 2008, he kept pestering me to start a New York MediaWiki meetup (he finally started one himself, a few months ago, which we now co-run, in theory). I kept demurring, saying I was only interested in something directly related to SMW, so he said to try organizing something like that instead. With that encouragement, in October 2008 I sent out an email asking if anyone would be interested in a New York users meeting; to my surprise, the interested responses all came from Boston, Seattle and Germany; some of them had previously talked about having an international meeting, but it hadn’t yet coalesced. The Boston meeting, which was hosted by eMonitor (now LeveragePoint) and which we jointly put together, happened quite quickly; literally a month later. So you can thank Sergey for helping to bring about a meeting he’s never attended, with people he doesn’t know. :)

Wikimania 2009 notes

Tuesday, September 15th, 2009

This email summarizes all the technical/Semantic MediaWiki parts of Wikimania 2009, in Buenos Aires, Argentina. Other highlights:

- getting to see Buenos Aires (and historic Colonia, Uruguay, just a ferry ride away). Buenos Aires is a beautiful city, with a nice-looking bridge; it looks quite a bit like a European city, but with much more political graffiti.

- seeing a keynote speech by Richard Stallman, the open-source pioneer, in which he both generic viagra generic cialis viagra levitra buy cialis viagra professional alienated and entertained the audience with his petulant attitude. Among other complaints, he was upset that Wikipedia doesn’t refer to Linux as “GNU/Linux”. See here for more than you’d care to really know about the whole issue.

- seeing all the talks getting translated into Spanish or English by in-person headset translators, which was pretty amazing; it felt like being at the UN.

- on that note, listening to some talks in Spanish, and being pleased to see that I could understand them without the headset translation. Although it helped that I knew the subject matter intimately ahead of time; I still can’t follow the telenovelas on Univision to save my life.

- the post-Wikimania party. There are some good dancers among the greater MediaWiki development community! I’m not naming any names, though.

Announcing Semantic Internal Objects

Thursday, August 20th, 2009

My latest extension: Semantic Internal Objects; this is either number 10 or 12, depending on how you count it; which is hard to believe. What is Semantic Internal Objects? In short, it lets you encode compound information, or what’s sometimes known as “n-ary relations“, within Semantic MediaWiki. If you want to record that, say, someone is president of a country, you can do that easily with SMW. But if you want to record that that person was president from a certain year to a certain other year, that hasn’t been possible in SMW until now, because it can’t be represented as a simple relationship (okay, actually, it has been possible, through multi-value properties, but I don’t consider those an ideal solution for various reasons). Semantic Internal Objects (SIO), in short, lets you do that, using a new parser function. I’m very excited about this extension; I think it’ll open up a lot of possibilities for various SMW-based websites, but we’ll see…

Semantic MediaWiki updates

Friday, July 31st, 2009
  • I was at the “NYC wiki-conference 2009“, held on the NYU campus, over the weekend; my thoughts about the conference are here. The one thing I forgot to mention, on a technical note, was a five-minute demo by Tom Maaswinkel, showing a MediaWiki wiki being edited via the soon-to-be-released Google Wave - it wowed the audience, as Google Wave demos tend to always do.
  • Jeroen De Dauw released version 0.2 of Maps and Semantic Maps. These new versions have, among other improvements, support for Yahoo! geocoding, and just better-looking code, which is going to be important in the long run, as other developers get their hands on it and start tinkering with the code.
  • I added Maps and Semantic Maps to Referata - Semantic Google Maps will be gone shortly. That means mapping on Referata has a lot more options, and it’s already starting to bear fruit - check out the Google Earth option on Food Finds, for instance. Pretty nice!
  • Sergey Chernyshev and I released a new version of Semantic Bundle, which now includes Maps and Semantic Maps, replacing Google Geocoder and SGM. It’s really the beginning of the end for SGM, not counting the 30+ wikis it’s already on…
  • While working on the new Semantic Bundle version, I had the thought that SMW is starting to feel like a mature technology; in that it seems like the majority of the features that it will eventually have are already in place. The addition of the Semantic Maps extension had a lot to do with it, I think; this was one of the big chunks that I thought was still missing. There are still things left to be done, of course; I have a list of around 30, though they won’t necessarily be features that I implement. And I’m sure there will be various improvements behind the scenes, to speed up queries and the like. But I really feel like the Semantic MediaWiki system of the future won’t look all that different from what it looks like now, with the interplay of categories, templates, forms, properties, External Data calls, tables, maps, calendars, widgets, etc. (whew!) that you can already find in various SMW-based wikis. Though I could be wrong about this.

For Semantic MediaWiki, it’s a mappy day

Thursday, July 23rd, 2009

I’ve been working with Jeroen De Dauw, a student in the Google Summer of Code, on creating a full-scale mapping interface for Semantic MediaWiki for a few months now; by which I mean that he’s done the actual work, and I’ve been around to answer questions and try to bask in the glory. Anyway, I think mapping is crucial for any generic data project, because so much information that we need on a daily basis is location-based, whether it’s information about businesses, people, events, etc. There’s already an extension that handles all this stuff - Semantic Google Maps - but it’s incomplete, first because it relies on Google Maps, which not everyone can use, second because it doesn’t support the incredible Google Earth, and third because it can’t handle displaying locations on non-geographic surfaces (more on that later). Another extension, Semantic Layers, also exists, which uses the open-source OpenLayers mapping service, but it’s had some problems since the beginning that were never fully resolved,

Anyway, yesterday and the day before, Jeroen released the two extensions that he’s been working on, that are meant to provide the generic solution for all of SMW’s mapping needs: they are the Maps and Semantic Maps extensions. Here’s how the two work together: Maps handles the display of individual points, along with geocoding (determining the coordinates of a specific address); and Semantic Maps handles the display of multiple points on a map, defined via Semantic MediaWiki, as well as providing maps as Semantic Forms form inputs. Both support the same mapping services, currently three: Google Maps, Yahoo! Maps and OpenLayers.

Jeroen has been keeping track of all the progress on his blog, which has a lot of information on all of this stuff, including some great screenshots, including this rather breathtaking one of Google Earth being used as a form input.

There’s still a month left in the Google Summer of Code, and Jeroen and I are excited about the extra cushion of time that provides, because it means that there’s an opportunity to add extra features to the system; like being able to show a clickable list of points near each map, so that maps can work more like this; and being able to use OpenLayers to display locations on non-geographic surfaces, such as images. That second one opens up a lot of possibilities, because it allows for things like annotated anatomical charts (see here for an example, from the Semantic Layers wiki) and displaying points on floorplans (see here for an example from the same wiki). For the latter, the example provided is for a video game, although you could easily imagine the same concept being used for more practical purposes, such as displaying events at a conference, or… showing the locations of enemy combatants in a building (hey, I’m allowed to fantasize a little, right?).

By a stroke of good timing, on Saturday I’ll actually be speaking at the New York City wiki-conference (basically a smaller-scale version of Wikimania), on the subject of all this mapping stuff; and hopefully being able to do a Steve-Jobs-at-Macworld thing, where I demo a recently unveiled technology to the crowd. Here’s a link to the panel I’ll be on: “Mapping in MediaWiki”. It’s free to attend, if anyone’s interested.

Semantic Bundle launched

Monday, June 22nd, 2009

Announcing Semantic Bundle - a single downloadable file that holds Semantic MediaWiki and 16 other MediaWiki extensions that use it and/or are often used in conjunction with it. The aim is to simplify the confusing landscape of extensions that’s evolved around Semantic MediaWiki, so that users can just get one file instead of having to research and download many files individually to get all the functionality they would want. What we have is a basic super-set of the kinds of extensions people usually end up using on SMW-driven wikis (administrators can choose which of the extensions to include, once they’ve downloaded the bundle.)

Semantic Bundle is similar to the SMW+ package distributed by Ontoprise, although it’s a different set of extensions; both include SMW, of course, but other than that the number of extensions they have in common is surprisingly small - which just goes how to show diverse the set of features has become, and may be another argument for this kind of “curatorial” work.

Semantic Bundle was developed, and is distributed, by Sergey Chernyshev and me.

SMW helps win contests, UPDATE: I can’t read very well

Tuesday, June 9th, 2009

Okay, all of the stuff I wrote before happened, but it was this time last year, not this year. I was off by an entire year. It’s still cool, though - maybe more impressive, actually, given how much functionality has been added to Semantic MediaWiki, etc. since last year. Anyway, what’s written below is not timely in the least.


This is cool. The company 23andMe creates reports for people on their genetic profiles - it doesn’t send anyone their entire DNA chain, but just notifies about the presence of SNPs (”snips”), which, as I understand it, are DNA sequences considered specifically informative. (The company’s also known for being founded by Google co-founder Sergey Brin’s wife, but I digress.) Anyway, in April they ran a contest in which they published the 23andMe data for an anonymous woman, and those who took part had to guess at as many of her attributes as possible. The winner was announced three weeks ago, and it was Mike Cariaso, whom I always enjoy talking to, and who runs the site (”snipedia”). In his winning entry, he gave details for her race, hair and eye color, proclivity for diseases, and more intangible things like personality and intelligence. In their announcement of the winner, the company didn’t say which of the details were accurate, but if even half of them are, it’s a surprising (to me) level of detail.

In any case, the really neat thing is that Mike used SNPedia as the database to get all this information; and SNPedia is a wiki that runs on Semantic MediaWiki, and Semantic Forms. So I think it’s great proof that SMW can compete with any technology out there at the moment as far as enabling open, collaborative databases.(Oh, and the prize is a free genetic screening, which sounds good if you’re into that sort of thing.)


Monday, June 1st, 2009

Lots of Semantic MediaWiki-related developments recently…

My name is ___, and I use SMW

Wednesday, May 20th, 2009

After some amount of planning, emailing and persuading, the Semantic MediaWiki testimonials page is now up. The page, as could be expected, holds a list of statements by various people about how SMW has helped them. There are eight testimonials already, featuring a good mix of contributions from corporations, research groups and individual websites. So now we have something that I think is rare: a testimonials page for an open-source application that has no organization running it. In other words, a marketing effort without the marketers, or even a CEO. Is this a harbinger of the future of work? I guess we’ll see. For now, I think this will be an important tool in getting companies and other organizations more comfortable with the idea of using SMW, especially in place of more slickly-marketed (but also more expensive) packages. And if you use Semantic MediaWiki and haven’t already submitted a testimonials, please feel free to do so - the email address is at the bottom of that page.

New extension: Admin Links

Wednesday, May 13th, 2009

I’m pleased to announce my latest extension, Admin Links, released earlier today; which, depending on how you count it, is around my ninth extension (a number I never would have guessed I would reach). I believe this is my conceptually simplest extension yet: just a page of links that are meant to be helpful for administrators. I think that this helps fix a hole in MediaWiki, though: I wrote before that I thought one of the top weaknesses of MediaWiki compared to competing systems was “lack of guidance from the interface about how administrators should accomplish their tasks”. Other applications have wizards, control panels and the like for helping administrators do their daily tasks, but when you first set up MediaWiki, there’s nothing looking back at you but a blank main page, and lots of pages of documentation elsewhere. Admin Links provides the bare minimum, which is a page (at “Special:AdminLinks”) of links to common administrative tasks (like editing the CSS file, managing users, viewing a list of all the wiki’s pages). In addition, for administrators, it puts a link to this page within their “user links”, which are the links usually at the top of the page of “my talk”, “my preferences”, etc.; that way, an administrator can easily get to it from whatever page they happen to be on. Finally, Admin Links provides an API for letting other extensions add on sections and links to the page, so that Special:AdminLinks can always serve as a control panel for whatever set of extensions are installed. You can see an example of Admin Links at work here, on Discourse DB; though, since you’re not an administrator, you won’t see a link to it at the top. I’ve modified my local versions of the Semantic MediaWiki and Semantic Forms extensions to call the Admin Links API already, so you can see a lot of links geared for those two. I plan to check in the new Admin Links code of SMW and SF at some point soon, as well as to add similar calls to some of my other extensions.

The idea for this extension actually came from my wiki hosting site, Referata, which already has such a page for administrators (though there it’s called “Helpful links” - which will probably be replaced by Admin Links soon). And the idea for that, in turn, came because I realized the sheer volume of pages that people creating a Semantic MediaWiki site need to know about was making it hard for people to get started. So, in a very real sense, Admin Links is a Semantic MediaWiki-inspired extension; though of course it will most likely have usage beyond that. I should also note that it was the head of SMW, Markus Krötzsch, who came up with the insightful idea of implementing it as a general extension with an API, back when I discussed it with him a long while ago.

Semantic wikis are the future of systems integration

Monday, May 4th, 2009

The title’s a deliberate reference to the essay I wrote about a year and a half ago, “Semantic wikis are the future of information” (a sentiment I still fully agree with, by the way). But in the last few months, since the release of the External Data extension, I’ve had a new realization: that semantic wikis are not just a good tool for storing data, but for handling what’s sometimes known as enterprise application integration: coordinating among a set of systems in an enterprise.

First, the problem: it’s extremely common for mid-to-large organizations - whether they’re companies, non-profits, governments, etc. - to have their data scattered over many different systems. A company can have a database application for accounting, another one for information on employees, another for information on employees, another for customer service information, another for website traffic, etc. Other information, like legal agreements or information about business partners, might not even be located in any single location: it could be spread out over documents or emails throughout a company’s departments. And some of those documents might be in hard-copy only, not on a computer. each data store can be quite useful for what it does (even printed documents in a manila folder somewhere can be a useful storage system); the problem is that the data can’t be combined together in any meaningful way.

Let’s take a straightforward example: a manager wants to know whether employee pay and/or seniority in the customer service department affect the quality of customer service; they also want to know whether service calls about specific products correlate with visits to the website’s help pages about those products, or whether perhaps there’s an inverse correlation, indicating that more information should be added to the website about certain products. The information is all there, somwhere; the problem is that there’s no way to combine it, aggregate it, visualize it, etc.

This is a well-known problem, and a whole body of work exists around it, including journals, books, conferences, etc. The individual data stores are referred to as “information silos“, “islands of information“, “data stovepipes”, etc., while the task of integrating them has been called enterprise information management, and more recently “Enterprise 2.0” and “Business Intelligence 2.0“. And many companies exist to try to solve this problem for organizations, including IBM, SAP etc.

Well, I believe that semantic wikis offer one solution to this problem, a solution that manages to be lightweight, fairly easy to implement, and I think powerful. This idea crystallized for me recently when I was working on a project to get Semantic MediaWiki into a large organization (I won’t say which it is, because discussions are still ongoing). This organization has all the typical problems of data in a large organization: inaccessible data and lack of any central control over all of it. During the course of planning, the group of us discussing how best to integrate SMW hit on what I think is a reasonable general approach. Here’s : most data stays where it is, in the applications in which it was created; the only big change to each individual system is that each one is now responsible for providing an “API” for getting at its data: basically, a web script that, when passed in the ID, within the URL, of some entity in their system, displays the data for that entity, in XML, CSV or JSON (three standard formats for displaying data). In some cases, this would just a short script, maybe less than 10 lines, containing just a single SQL call; in most cases, it likely wouldn’t be a big technical challenge. (And to clarify further, the API, though it would be web-based, could still be behind a firewall; the information would not have to be opened to the public).

At this point the wiki comes in: it would have a page for each entity, with each page containing a template call based on the type of data it represents. This template call would, in turn, extract the data for this page from the relevant data source (or data sources) via their API(s), using the External Data extension. Ths data would then be displayed to users, and also most likely stored via semantic properties, so that it could then be aggregated into lists, graphs, calendars, etc.

The end result is a system in which no pre-existing component needs to know about any other component (only the wiki needs to know about everything), and parts can be brought in and out without bringing down the overall system. Also, it requires no programming, only some wiki-page scripting. And it’s based entirely on free, open-source software.

There’s one additional complication, which is components that don’t have database-backed data storage, that an organization would want to upgrade as part of an enterprise-integration process anyway: data contained in files, or emails, or printed documents. For these, the data could easily be moved onto the wiki, making use of what wikis were originally intended to do, which is storing text information. The flexibility of semantic wikis means that such a transfer could be done gradually, based on the needs of the organization. For a group of PowerPoint presentations, for instance, the wiki could start out as a directory containing the location of each file in the company’s file server, and then eventually come to semantically hold all the data contained in those files.

So there it is: a semantic wiki system (in this case, SMW, although if it takes off I’m sure other wikis will copy this functionality), plus custom APIs per system, provides the ability to do relatively pain-free data integration.

I’m not the first person to think of data integration by means of components publishing their own data; in fact, that’s been one of the suggested uses of so-called Semantic Web technology, where each component publishes data in a format like RDF or OWL, and semantic reasoners and SPARQL queries pull it all together. That, too, is a valid approach; my basic objection to it is I think it’s overkill: you can easily get bogged down in a world of competing ontologies and mismatched data, if the goal is to get all the RDF outputs to be compatible with one another. With a wiki at the center, on the other hand, each component can just publish its data in the simplest format possible, and let the wiki deal with all the data-matching and exception-handling.

I’m in the Google Summer of Code

Thursday, April 23rd, 2009

I’m very pleased to say that, as was announced Monday, I’ll be mentoring one of the four projects for the Wikimedia foundation in the 2009 Google Summer of Code. If you don’t know about the Google Summer of Code (or “GSoc”, as it’s affectionately called), it’s a fantastic program, fully funded by Google, that pays students around the world to work on established open-source projects over a summer. The student I’m mentoring is Jeroen De Dauw, a budding hacker in Belgium (and, coincidentally, one with a first name pronounced very similarly to mine, which is why some people when they first hear my name think I’m Dutch). He’s already got the requisite enthusiasm and programming experience that makes me think the project will be a success.

The planned project is different from what’s described on the site, due to some re-thinking. The current plan is for Jeroen to create a new MediaWiki extension, called “Semantic Maps”, that will hold all support for mapping services: initially Google Maps and OpenLayers (replacing the current Semantic Google Maps and (not-really-working) Semantic Layers extensions), and then, as time permits, Google Earth and Yahoo! Maps as well.

This project was easily accepted, which was great; it was mostly luck, due to not that many people signing up to mentor for Wikimedia this year; bringing to mind Woody Allen’s quote that 90% of success is just showing up.

However idiosyncratic the process of getting accepted was, there’s nothing idiosyncratic about the project itself. Geographical mapping is a very important feature in data visualization; judging by this somewhat-reliable list of active SMW-using sites, Semantic Google Maps is the second most-popular additional extension for SMW sites, after Semantic Forms. Of course, that’s Google Maps; and I don’t doubt that Google Maps will remain the most popular mapping service even as others become available, but all the others have their specific strengths and user base: OpenLayers allows for mapping on non-geographic surfaces, like anatomical images and blueprints; Google Earth shows a 3-D view of the world; and Yahoo! Maps has fewer license restrictions than Google Maps does.

So that should be an exciting project; I’m also looking forward to just being a mentor. I’ll hopefully post some updates about Semantic Maps here as it gets developed.

Resolving MediaWiki and SMW weaknesses: discussion forums

Tuesday, April 7th, 2009

As the Semantic MediaWiki system becomes more mature and better-known, it’s encountering a new (and somewhat exciting) problem: it’s getting increasingly faced off against other applications when large organizations evaluate it as a possible content-management/systems-integration/etc. solution. These other applications include, most notably, Microsoft SharePoint, but also “enterprise wikis” like Confluence and SocialText. And when these matchups occur they inevitably bring the weaknesses and gaps in MediaWiki and SMW into focus. The weaknesses that I’ve personally heard have been raised in this way are:

  1. Lack of good WYSIWYG editing (there is a WYSIWYG-editing extension, FCKeditor, that works fine in most circumstances, and I’m in the minority who doesn’t think WYSIWYG editing for wikis is that necessary in the first place, but it’s been brought up as an issue)
  2. Lack of discussion forums
  3. Little to no access control, for being able to set who can read and/or edit which pages
  4. Lack of guidance from the interface about how administrators should accomplish their tasks
  5. A boring appearance - most MediaWiki sites tend to look almost exactly like Wikipedia, which itself doesn’t look that exciting
  6. Especially for Semantic MediaWiki (as opposed to MediaWiki itself), a skepticism about committing to a system that would require either training internal staff or keeping around consultants indefinitely

Those are the big ones, as far as I’m aware. It should be noted that issues of actual storage and display of data, which take up almost all of the focus of SMW discussions and development, don’t seem to have come up in evaluations of SMW at all; which I think indicates that SMW is far ahead of its competitors on data-related matters. Which is great news, though it does suggest that maybe our efforts should be re-prioritized to some extent.

I have some thoughts on how to deal with all of these, except for the first one, and they’re all worth having a discussion about (#3, the access-control issue, is probably worth having quite a few discussions about). But what I want to talk about in this post is issue #2, the lack of discussion forums in MediaWiki. I’ve heard it mentioned as a concern for three different large organizations in the last month, which I assume means that it’s a big issue and will stay that way until it’s solved.

I think the first thing that needs to be addressed, when talking about discussion forums, is that at least three different things fall into the realm of “discussion forums”, which may help explain why it’s been so hard to get a definitive solution. Here are what I see as the three things:

  1. Discussions about wiki pages - questions and conversations about the layout, content, data etc. of the pages in the wiki
  2. Discussions about the wiki’s topics - a place for people to talk, vent and argue about the actual subjects of each wiki page, independent of what the wiki pages happen to contain
  3. General discussions - forum-like discussions that may be unrelated to anything specifically in the wiki

The first kind of discussion is what MediaWiki’s “Talk” pages are geared for, and generally I think they work fine for that purpose. You could make the case that this system could use some improvement - there’s no reason why users should be able to edit others’ comments, for instance - but I haven’t seen any major problems with them, and extensions already exist, like Liquid Threads, that make Talk pages more forum-like.

The second kind of discussion is unique to public wikis - wikis that are meant to attract a general readership, where there will be a set of users who want to read the contents and comment on the topics, without modifying the content itself. On Wikipedia such comments are simply not allowed, which I think is the right thing to do for a mass-audience reference. But for more-specific sites, meant to attract people interested in one particular set of topics, allowing general venting and discussion makes sense. The current best way to do this, in my opinion, is to have such comments be handled by an outside system. The OpenCongress wiki handles them in such a way: the wiki page on the Employee Free Choice Act, for instance, links to OpenCongress’ main page on this bill (at least, the House version), which itself has a tab for the comments page. The flow could be a little nicer, but the system provides a clear location for comments. Of course, in the case of OpenCongress, the non-wiki site, with comments pages, already existed before the wiki was set up, so it was obvious which approach to take. In the case of a wiki without an external site attached, there’s no good, easy solution at the moment. I believe such a solution is important; I also believe that it should be implemented in some way outside the wiki - in other words, comments should be entered in HTML not wiki text, and they shouldn’t be editable once they’re entered. I also don’t know if comments pages should use the wiki’s user-registration system - commenting systems on blogs and such in general seem to work fine without registration, and I believe it might be important to maintain a separate “identity” between making changes to the wiki and expressing one’s personal opinions. For all those reasons, I think it’s a bad idea to use Talk pages for that purpose, although it’s tempting. (And there’s also the fact that Talk pages are already used for discussions about the wiki content.) So that leaves - some sort of way for comment pages to be integrated into a wiki. This definitely could use more thought and discussion.

The third kind of discussion is just discussions in general, potentially on any topic, that people who read and edit the wiki would want to have specifically with one another. For a private wiki in an organization, this would just be a forum for employees/members to talk; for a public wiki on a specific topic, it would be a forum devoted to that topic. Here there’s the least-strong argument for integrating the discussion directly into the wiki, since plenty of good forum software already exists, like phpBB”, and a MediaWiki extension would never be able to match their functionality (some people have tried creating forms using Semantic Forms to enable such a thing, but I don’t think that’ll ever work nearly as well as dedicated software). However, it’s definitely worth creating, at the very least, a “best practices” document explaining how MediaWiki and forum software should be used together and link to one another; and possibly how to integrate their user-registration systems, using OpenID or anything else.

So that’s what I think about disucussions in MediaWiki. I may get around to writing about the other ones; let me know in the comments if there are any that you specifically want to hear my thoughts on, and of course feel free to share your own thoughts.

A longer-than-expected post about External Data and the OpenCongress wiki

Wednesday, March 18th, 2009

I’m well overdue, but here, finally, is my full explanation of the External Data MediaWiki extension; there have been quite a few improvements to it since even the overhauled release, so maybe some of the delay was justified… at least, I’d like to think so.

First of all, you can see the PDF slides from my conference-call presentation here.

The basic goal of External Data is to allow structured data from the outside world to be displayed, and otherwise used, in a wiki. There are lots of APIs out there on the web, with more coming all the time, and this extension allows them to be accessed in a very lightweight manner: no need to specify an XML XPath structure, or a SPARQL query (and if you don’t know what those terms mean, all the better for you): you just declare the URL you want to access, and the variables you want to retrieve, and it does the work. And, as I noted earlier, since Semantic MediaWiki provides its own web-based API for accessing data, you can also use External Data to display data from one SMW-based wiki in another.

As I noted in the presentation, though, the vast majority of the world’s data is not accessible via a web-based API and never will be. Instead, it’s contained in database tables or Excel spreadsheets, or in even less-queriable sources: books, paper documents, etc. If there’s a set of data like that that we want to use in our wiki, how do we do that? Yes, we can go the Wikipedia route of just manually entering the data in wherever it’s necessary. However, this leads to a lot of redundant work, and avoiding it is most of the reason we use semantic wikis in the first place. The next-best approach involves using Semantic MediaWiki: you import the data into wiki pages using some sort of automated tool, with the pages containing either direct semantic annotations or template calls that translate into semantic annotations. The data then gets stored in SMW’s data tables, where it can be queried. This approach, as far as I know, has already been done in a few places; however, there’s a major problem with it: if the original data gets changed or expanded, it’s very hard to re-import it, because now you have to merge with whatever changes have been made by users on the wiki.

The ideal solution is to keep the data where it is and create an API for accessing it; however, most of the time that’s not feasible (it’s beyond most organizations’ abilities to create a web service for getting the data from an Excel spreadsheet, for instance). So the “enhanced” External Data allows for what I think is the next-best solution: you put the data into its own wiki page, in CSV format (basically the simplest kind of data format there is; all the values are just separated by commas). The page ‘Special:GetData’, defined by External Data, then serves as a “mini-API” for accessing this data: it takes in the name of a CSV-holding wiki page, and an optional set of criteria, and returns the set of rows that match those criteria. This gives you all the benefits of having an API: the outside world can easily access your data, and you can access it yourself on other wiki pages, using External Data’s standard querying. For some examples of the latter, see this test page on Discourse DB.

But, you may say, importing the data into a wiki page causes all the same problems we were trying to avoid in the first place! After all, it can still be modified by users after the import, making it difficult to re-import. That’s true, but at least the data is now separated from free text, formatting and other things that users may want to be involved with, so the chance of users modifying any of the pages that hold the actual data becomes much smaller; it’s generally a clean solution.

So that’s External Data. The other big wiki-related news is that the project I was working on for the last few months was released two weeks ago (I’m really behind on this stuff): the OpenCongress wiki. It’s meant to work in conjunction with OpenCongress, a site that holds information about the politicians, committees, legislation and campaign contributions of the U.S. Congress. The wiki holds a subset of that information, and it’s of course user-editable. As you can see from the wiki’s version page, it uses Semantic MediaWiki and many of the extensions that have become generally associated with it, including my Semantic Forms and Semantic Drilldown extensions and Sergey Chernyshev’s indispensable Widgets and Header Tabs extensions; all are meant to enable a data-centered approach to the wiki.

I bring it up in this same post because the OpenCongress wiki also uses External Data; actually, in my mind the site and the extension are somewhat interwoven, because External Data was created during my work on the OpenCongress wiki, was inspired by it to a large extent, and certainly got its first usage (and debugging) in the site. External Data is used in various places in the OpenCongress wiki, both to access data from outside APIs (like that of Sunlight Labs), and to handle data that has no API, using the “CSV page” approach. You can see an example of the latter here - a wiki page containing raw data on one organization’s “scorecard” for one year’s senate membership. You can see that data then being displayed here, on the page for Senator Barbara Boxer, using External Data (currently only this one scorecard’s data is displayed for all senators, but I believe it’s due to get expanded soon).

The OpenCongress wiki is a fantastic site for people looking for political information; in addition, I think that, for wikis, it represents the shape of things to come. That’s because it serves as a “mashup” of many different data sources, allowing for much more information brought to the user than relying on just the wiki’s own editors would. Different types of data are brought together in a relatively seamless way: free text written by regular wiki users; semantic data entered through forms; data from OpenCongress’ own database; data from outside APIs; data that’s not otherwise web-accessible (like the scorecard information); outside services like social-bookmarking tools and Google Maps; and “feed” sources like YouTube and Twitter. External Data, and the large and growing number of great data APIs around the web, make this so easy to do that I’d imagine it won’t be long before other wikis start to follow this same strategy.

Finally, on a side note, it might be mischievous of me to note that all this data integration is done without the use of RDF… but, whatever you think of RDF, that is the truth.

External Data 2.0 (actually 0.4, but same thing)

Wednesday, February 11th, 2009

Yesterday I released a new version of the External Data extension that allows it to, in addition to previous functionality, get a table’s worth of data (instead of just single values), and extract data from any wiki page holding values in CSV format. The more I think about it, the more I think these additions make External Data among the most important MediaWiki extensions I’ve released (or co-released,since Michael Dale contributed), or maybe even the most important, beating out Semantic Forms; I guess we’ll see.

I hope to write more about the “new” External Data at some point; for now, if you want to hear more about it and what I think its implications are, I’ll be talking about it tomorrow (Thursday) at 1:30 PM EST in session 5 of the semantic wiki conference call “mini-series”. Anyone is free to join in the call. There’ll also be other talks (including three from people I met at the Semantic MediaWiki users meeting in Boston), that should be quite interesting.

How Wikipedia enabled semantic wikis

Friday, January 16th, 2009

It turns out that yesterday was the seventh anniversary of the launch of Wikipedia, otherwise known as Wikipedia Day. So it’s probably as good a time as any to acknowledge the huge impact Wikipedia has had on my own career. It’s rare to say that a website has single-handedly brought into being an entire field of technology, but one could make a case that Wikipedia has done that for my field, semantic wikis - which is, of course, ironic, since Wikipedia itself does not use semantic technology. The site, though, has enabled what I do to come about in several different ways - enough that it’s hard to dispute the direct link. Here are the ways in which Wikipedia has made semantic wikis possible:

  • It taught the world about wikis. Most people, when they first heard about Wikipedia, a site where anyone can edit anything, probably had the same reaction: sounds like a recipe for disaster. To be sure, some critics of Wikipedia still say that’s the case; but for most of the hundreds of millions of people who read the site, seeing it work has been an eye-opening experience: the realization that a site where users can edit the content of any page can work. And for some users (including me), the realization that not only is it a workable solution, and not even just the best solution, but in some cases it’s the only solution for aggregating information in one place. And so Wikipedia’s proof-of-concept inspired many people to create their own wikis for their businesses, organizations or personal interests. I dare say that 99% of the people who have been involved with semantic wikis got their first experience with wikis by reading Wikipedia; I’m part of that group.
  • It has inspired researchers. Beyond just Wikipedia as a proof-of-concept, the idea of turning Wikipedia into more a database-like information store has captured the imaginations of a lot of people. That’s how Semantic MediaWiki got its start: the first paper published about the project was titled “Semantic Wikipedia”, and the concept remains the holy grail for many of those involed with the project (not for me personally, though I can understand the excitement). And Freebase, the other major semantic wiki technology (in my opinion), which uses its own proprietary application, has billed itself as a “Wikipedia for data”; I wouldn’t be surprised if it was conceived that way too. (It’s an open question what will happen to Freebase if Wikipedia goes semantic, and thus itself becomes the Wikipedia for data.)
  • It has enabled the technology. MediaWiki, the wiki engine developed specifically for Wikipedia, is also, in my opinion, the best wiki engine, of the dozens that exist. It’s robust, scalable, and full of useful features. Two of those features have, I think, made it ideally suited for use in semantic wikis: templates and hooks. Templates enable the separation of data from data structure and presentation, which lets a semantic wiki approximate much more closely a regular database-driven website; while hooks, of which MediaWiki has hundreds, allow extensions like Semantic MediaWiki to integrate nicely into the rest of the package with little or no coordination between the extension developers and the main MediaWiki developers: that, in turn, allows for much faster development time. Neither one is a coincidence: the nature of Wikipedia and its massive size make conveniences like these into something more like necessities.

So, a big thank you to Wikipedia, and of course to its two co-founders: Larry Sanger, who had the idea to use a wiki to power the world’s first free online encyclopedia; and Jimmy Wales, who has guided the project successfully through ever since.

New MediaWiki extension: External Data

Tuesday, January 13th, 2009

I’m pleased to announce External Data, my new MediaWiki extension; this is somewhere between my sixth and ninth released extension, depending on how you count it. External Data allows wiki pages to use and display values that were retrieved from an outside URL that itself holds XML or CSV data. It’s a very simple extension (my smallest one, I think), but I think it has some important implications for SMW. Using it, one Semantic MediaWiki-based site can get the data from another, using a query with the ‘CSV’ format, and then store it semantically. See here for an example of that usage, on Discourse DB - it displays and then semantically stores data that was retrieved from this page on Check out the source code of the first page for the specifics of how it’s done. This means that now the information from two or more semantic wikis can be combined together in one place, then queried, mapped, etc., as if it were all just one wiki’s data.

This idea of pooling data from different websites is of course the main concept behind the so-called Semantic Web (not a term I like all that much, but that’s a different story). At the moment, I can’t imagine that this extension will be used much for the classic semantic-web example, of gathering data from completely unrelated wikis (or what could be called a “mashup”); but for wikis and other online data sources that have already coordinated among themselves to split up the handling of data, I think it’s a very reasonable solution for doing that.

New Semantic MediaWiki hosting site

Monday, January 5th, 2009

The site Pseudomenon, which appears to have just been released yesterday, is the newest entrant to the small club of semantic wiki hosting sites. This is, as far as I know, the third site to offer hosting of Semantic MediaWiki, and the fourth to offer hosting of any sort of semantic wiki, the one non-SMW site being Swirrl. It’s the first, though, to support the Halo extension (also known as “SMW+”), which allows free-form semantic annotation and querying of wiki pages. Pseudomenon doesn’t include any other extensions at the moment, but the inclusion of Halo by itself makes it a helpful addition.

According to the main page, hosting is free, and every wiki gets a subdomain at

Apparently, the word “pseudomenon” is a reference to the Epimenides paradox, in which a Cretan stated “all Cretans are liars”. A snide commentary on truth in wikis? Well, at least it’s a real word, as opposed to the fake-Latin “Referata” I came up with, though I later found out that means, I believe, “reports” in Croatian.

Semantic MediaWiki conference call

Tuesday, December 9th, 2008

Do you like reading about semantic wikis, but really wish you could hear me talking on the phone about them? Well, you’re in luck, because I’ll be speaking in the 3rd session of the semantic wiki “mini-series” of conference calls, on Thursday. The last two sessions, which happened over the last two months, covered the broader world of semantic wikis; this one focuses specifically on Semantic MediaWiki. Markus Krötzsch, the lead developer of SMW, will talk about the core of the technology, and I’ll talk about “Semantic Forms, Semantic Drilldown, Semantic Result Formats, Semantic Google Maps, Semantic Compound Queries and Data Transfer” (evidently, I get bored easily). There will also be people from the Ontoprise corporation presenting their contributions, and some other presenters. Each presentation will also have a real-time slide show on the web. You can see the presentation time and phone number here (it depends on where you live), plus other details, and a place to RSVP (you don’t need to RSVP to watch/listen, but it’s strongly recommended).