A longer-than-expected post about External Data and the OpenCongress wiki

I’m well overdue, but here, finally, is my full explanation of the External Data MediaWiki extension; there have been quite a few improvements to it since even the overhauled release, so maybe some of the delay was justified… at least, I’d like to think so.

First of all, you can see the PDF slides from my conference-call presentation here.

The basic goal of External Data is to allow structured data from the outside world to be displayed, and otherwise used, in a wiki. There are lots of APIs out there on the web, with more coming all the time, and this extension allows them to be accessed in a very lightweight manner: no need to specify an XML XPath structure, or a SPARQL query (and if you don’t know what those terms mean, all the better for you): you just declare the URL you want to access, and the variables you want to retrieve, and it does the work. And, as I noted earlier, since Semantic MediaWiki provides its own web-based API for accessing data, you can also use External Data to display data from one SMW-based wiki in another.

As I noted in the presentation, though, the vast majority of the world’s data is not accessible via a web-based API and never will be. Instead, it’s contained in database tables or Excel spreadsheets, or in even less-queriable sources: books, paper documents, etc. If there’s a set of data like that that we want to use in our wiki, how do we do that? Yes, we can go the Wikipedia route of just manually entering the data in wherever it’s necessary. However, this leads to a lot of redundant work, and avoiding it is most of the reason we use semantic wikis in the first place. The next-best approach involves using Semantic MediaWiki: you import the data into wiki pages using some sort of automated tool, with the pages containing either direct semantic annotations or template calls that translate into semantic annotations. The data then gets stored in SMW’s data tables, where it can be queried. This approach, as far as I know, has already been done in a few places; however, there’s a major problem with it: if the original data gets changed or expanded, it’s very hard to re-import it, because now you have to merge with whatever changes have been made by users on the wiki.

The ideal solution is to keep the data where it is and create an API for accessing it; however, most of the time that’s not feasible (it’s beyond most organizations’ abilities to create a web service for getting the data from an Excel spreadsheet, for instance). So the “enhanced” External Data allows for what I think is the next-best solution: you put the data into its own wiki page, in CSV format (basically the simplest kind of data format there is; all the values are just separated by commas). The page ‘Special:GetData’, defined by External Data, then serves as a “mini-API” for accessing this data: it takes in the name of a CSV-holding wiki page, and an optional set of criteria, and returns the set of rows that match those criteria. This gives you all the benefits of having an API: the outside world can easily access your data, and you can access it yourself on other wiki pages, using External Data’s standard querying. For some examples of the latter, see this test page on Discourse DB.

But, you may say, importing the data into a wiki page causes all the same problems we were trying to avoid in the first place! After all, it can still be modified by users after the import, making it difficult to re-import. That’s true, but at least the data is now separated from free text, formatting and other things that users may want to be involved with, so the chance of users modifying any of the pages that hold the actual data becomes much smaller; it’s generally a clean solution.

So that’s External Data. The other big wiki-related news is that the project I was working on for the last few months was released two weeks ago (I’m really behind on this stuff): the OpenCongress wiki. It’s meant to work in conjunction with OpenCongress, a site that holds information about the politicians, committees, legislation and campaign contributions of the U.S. Congress. The wiki holds a subset of that information, and it’s of course user-editable. As you can see from the wiki’s version page, it uses Semantic MediaWiki and many of the extensions that have become generally associated with it, including my Semantic Forms and Semantic Drilldown extensions and Sergey Chernyshev’s indispensable Widgets and Header Tabs extensions; all are meant to enable a data-centered approach to the wiki.

I bring it up in this same post because the OpenCongress wiki also uses External Data; actually, in my mind the site and the extension are somewhat interwoven, because External Data was created during my work on the OpenCongress wiki, was inspired by it to a large extent, and certainly got its first usage (and debugging) in the site. External Data is used in various places in the OpenCongress wiki, both to access data from outside APIs (like that of Sunlight Labs), and to handle data that has no API, using the “CSV page” approach. You can see an example of the latter here - a wiki page containing raw data on one organization’s “scorecard” for one year’s senate membership. You can see that data then being displayed here, on the page for Senator Barbara Boxer, using External Data (currently only this one scorecard’s data is displayed for all senators, but I believe it’s due to get expanded soon).

The OpenCongress wiki is a fantastic site for people looking for political information; in addition, I think that, for wikis, it represents the shape of things to come. That’s because it serves as a “mashup” of many different data sources, allowing for much more information brought to the user than relying on just the wiki’s own editors would. Different types of data are brought together in a relatively seamless way: free text written by regular wiki users; semantic data entered through forms; data from OpenCongress’ own database; data from outside APIs; data that’s not otherwise web-accessible (like the scorecard information); outside services like social-bookmarking tools and Google Maps; and “feed” sources like YouTube and Twitter. External Data, and the large and growing number of great data APIs around the web, make this so easy to do that I’d imagine it won’t be long before other wikis start to follow this same strategy.

Finally, on a side note, it might be mischievous of me to note that all this data integration is done without the use of RDF… but, whatever you think of RDF, that is the truth.

49 Responses to “A longer-than-expected post about External Data and the OpenCongress wiki”

  1. Kingsley Idehen Says:

    The reason why RDF matters simply comes down to the fact that the “raw data” is available in structured form (an entity-attribute-value graph) and amenable to meshing across regular and irregular boundaries.

    RDF is not about presentation level integration, its about data integration with negotiable representation of the resulting data meshes.

    See example:
    1. http://linkeddata.uriburner.com/about/html/http://www.govtrack.us/congress/person.xpd?id=300011

  2. Yaron Says:

    Well, I’m definitely aware of the arguments in favor of RDF. I will, say, though, that External Data, in conjunction with other tools like Semantic MediaWiki, does allow for data integration, not just presentation-level integration. The two big advantages that RDF has, it seems to me, are that you can do very complex querying on the data, and you can access data whose structure you don’t know ahead of time. How well those work in practice, and how useful they are when they do work, are up to the reader to decide.

  3. Stephen Larson Says:

    Really cool that you are engaging the OpenCongress Wiki. I notice that at the moment a lot of content is still in plain text or standard templates rather than taking advantage of the Semantic Mediawiki properties. Will there be a conversion process?

  4. Yaron Says:

    Thanks. I don’t know the answer to that - a lot of the data is being stored or retrieved semantically (it might be more than you noticed), although it’s true that a good amount isn’t, and I don’t know if there’s any plan to deal with the rest of it.

  5. sgrhy Says:

    sgrhy…

    I intended to create you a bit of word to give many thanks again considering the incredible views you have shown above. It’s certainly tremendously generous of you to give freely all that a lot of people could possibly have offered for sale as an elec…

  6. voyance gratuite en direct Says:

    Marie claire marie claire horoscope vierge femme du jour

    Also visit my weblog: voyance gratuite en direct

  7. dKoRLMrY Says:

    dKoRLMrY…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  8. Gscraper Says:

    Gscraper…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  9. louis vuitton outlet Says:

    louis vuitton outlet…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  10. cheap iphone 5 Says:

    cheap iphone 5…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  11. CHANEL BAGS Says:

    CHANEL BAGS…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  12. chanel outlet Says:

    chanel outlet…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  13. air jordan shoes for sale Says:

    air jordan shoes for sale…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  14. isbale marant sneaker Says:

    isbale marant sneaker…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  15. cheap oakley sunglasses Says:

    cheap oakley sunglasses…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  16. Wow Gold Says:

    Wow Gold…

    –…

  17. Hollister Outlet Says:

    Hollister Outlet…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  18. M·A·C Cosmetics Says:

    M·A·C Cosmetics…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  19. Mulberry Outlet Says:

    Mulberry Outlet…

    –…

  20. MAC MAKEUP WHOLESALE Says:

    MAC MAKEUP WHOLESALE…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  21. Herve Leger Dress Cheap Says:

    Herve Leger Dress Cheap…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  22. prada outlet Says:

    prada outlet…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  23. soldés lancel Says:

    soldés lancel…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  24. GUCCI BAGS OUTLET Says:

    GUCCI BAGS OUTLET…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  25. polo ralph lauren outlet Says:

    polo ralph lauren outlet…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  26. GUCCI OUTLET Says:

    GUCCI OUTLET…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  27. iphone 5 for sale Says:

    iphone 5 for sale…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  28. sac lancel pas cher Says:

    sac lancel pas cher…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  29. Chanel Sale Says:

    Chanel Sale…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  30. Chanel Bags Cheap Says:

    Chanel Bags Cheap…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  31. sgrhy Says:

    sgrhy…

    There’s noticeably a bundle to know about this. I assume you made sure good points in options also….

  32. FAKE OAKLEY Says:

    FAKE OAKLEY…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  33. chanel handbags Says:

    chanel handbags…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  34. prada shoes Says:

    prada shoes…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  35. mac cosmetics wholesale Says:

    mac cosmetics wholesale…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  36. chanel outlet bags Says:

    chanel outlet bags…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  37. GUCCI HANDBAGS Says:

    GUCCI HANDBAGS…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  38. vibram 5 fingers Says:

    vibram 5 fingers…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  39. toms shoes Says:

    toms shoes…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  40. rs gold Says:

    rs gold…

    –…

  41. Hollister Outlet Says:

    Hollister Outlet…

    –…

  42. louis vuitton totes Says:

    louis vuitton totes…

    –…

  43. Gucci Outlet Wallet Says:

    Gucci Outlet Wallet…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  44. gucci bags on sale Says:

    gucci bags on sale…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  45. CHEAP AIR JORDAN SHOES Says:

    CHEAP AIR JORDAN SHOES…

    –…

  46. cheap iphone 5 unlocked Says:

    cheap iphone 5 unlocked…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  47. Chi Flat Iron Says:

    Chi Flat Iron…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  48. Louis Vuitton UK Says:

    Louis Vuitton UK…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…

  49. Vibram Five Fingers Says:

    Vibram Five Fingers…

    Yaron is Writing » Blog Archive » A longer-than-expected post about External Data and the OpenCongress wiki…