The title’s a deliberate reference to the essay I wrote about a year and a half ago, “Semantic wikis are the future of information” (a sentiment I still fully agree with, by the way). But in the last few months, since the release of the External Data extension, I’ve had a new realization: that semantic wikis are not just a good tool for storing data, but for handling what’s sometimes known as enterprise application integration: coordinating among a set of systems in an enterprise.
First, the problem: it’s extremely common for mid-to-large organizations - whether they’re companies, non-profits, governments, etc. - to have their data scattered over many different systems. A company can have a database application for accounting, another one for information on employees, another for information on employees, another for customer service information, another for website traffic, etc. Other information, like legal agreements or information about business partners, might not even be located in any single location: it could be spread out over documents or emails throughout a company’s departments. And some of those documents might be in hard-copy only, not on a computer. each data store can be quite useful for what it does (even printed documents in a manila folder somewhere can be a useful storage system); the problem is that the data can’t be combined together in any meaningful way.
Let’s take a straightforward example: a manager wants to know whether employee pay and/or seniority in the customer service department affect the quality of customer service; they also want to know whether service calls about specific products correlate with visits to the website’s help pages about those products, or whether perhaps there’s an inverse correlation, indicating that more information should be added to the website about certain products. The information is all there, somwhere; the problem is that there’s no way to combine it, aggregate it, visualize it, etc.
This is a well-known problem, and a whole body of work exists around it, including journals, books, conferences, etc. The individual data stores are referred to as “information silos“, “islands of information“, “data stovepipes”, etc., while the task of integrating them has been called enterprise information management, and more recently “Enterprise 2.0” and “Business Intelligence 2.0“. And many companies exist to try to solve this problem for organizations, including IBM, SAP etc.
Well, I believe that semantic wikis offer one solution to this problem, a solution that manages to be lightweight, fairly easy to implement, and I think powerful. This idea crystallized for me recently when I was working on a project to get Semantic MediaWiki into a large organization (I won’t say which it is, because discussions are still ongoing). This organization has all the typical problems of data in a large organization: inaccessible data and lack of any central control over all of it. During the course of planning, the group of us discussing how best to integrate SMW hit on what I think is a reasonable general approach. Here’s : most data stays where it is, in the applications in which it was created; the only big change to each individual system is that each one is now responsible for providing an “API” for getting at its data: basically, a web script that, when passed in the ID, within the URL, of some entity in their system, displays the data for that entity, in XML, CSV or JSON (three standard formats for displaying data). In some cases, this would just a short script, maybe less than 10 lines, containing just a single SQL call; in most cases, it likely wouldn’t be a big technical challenge. (And to clarify further, the API, though it would be web-based, could still be behind a firewall; the information would not have to be opened to the public).
At this point the wiki comes in: it would have a page for each entity, with each page containing a template call based on the type of data it represents. This template call would, in turn, extract the data for this page from the relevant data source (or data sources) via their API(s), using the External Data extension. Ths data would then be displayed to users, and also most likely stored via semantic properties, so that it could then be aggregated into lists, graphs, calendars, etc.
The end result is a system in which no pre-existing component needs to know about any other component (only the wiki needs to know about everything), and parts can be brought in and out without bringing down the overall system. Also, it requires no programming, only some wiki-page scripting. And it’s based entirely on free, open-source software.
There’s one additional complication, which is components that don’t have database-backed data storage, that an organization would want to upgrade as part of an enterprise-integration process anyway: data contained in files, or emails, or printed documents. For these, the data could easily be moved onto the wiki, making use of what wikis were originally intended to do, which is storing text information. The flexibility of semantic wikis means that such a transfer could be done gradually, based on the needs of the organization. For a group of PowerPoint presentations, for instance, the wiki could start out as a directory containing the location of each file in the company’s file server, and then eventually come to semantically hold all the data contained in those files.
So there it is: a semantic wiki system (in this case, SMW, although if it takes off I’m sure other wikis will copy this functionality), plus custom APIs per system, provides the ability to do relatively pain-free data integration.
I’m not the first person to think of data integration by means of components publishing their own data; in fact, that’s been one of the suggested uses of so-called Semantic Web technology, where each component publishes data in a format like RDF or OWL, and semantic reasoners and SPARQL queries pull it all together. That, too, is a valid approach; my basic objection to it is I think it’s overkill: you can easily get bogged down in a world of competing ontologies and mismatched data, if the goal is to get all the RDF outputs to be compatible with one another. With a wiki at the center, on the other hand, each component can just publish its data in the simplest format possible, and let the wiki deal with all the data-matching and exception-handling.