Archive for the 'Uncategorized' Category

News from the exciting world of SMW

Friday, February 12th, 2010

Some random Semantic MediaWiki-based news, that I haven’t gotten to because I’ve been away from my blog… future updates like this will probably show up on the WikiWorks blog instead. So what does that leave for this blog? Who knows.

I’m back

Thursday, February 11th, 2010

Somehow I’ve left this blog languish for over two months, which is much longer than I meant to. Somehow other things kept taking priority…

What have I been up to since my last post? A bunch of random stuff: I flew to Shanghai and San Francisco, and my wife and I went to Vancouver, and upstate New York (we also had our honeymoon before my previous post, which I meant to write about but then never did - it was great. We also have lots of photos… argh.) We celebrated the new decade with drinks and Beatles Rock Band. I released new versions of most of my MediaWiki extensions, plus Semantic Bundle.

We had a meeting of our New York MediaWiki users meetup. I created the Referata FAQ and Semantic MediaWiki FAQ. I answered lots and lots of questions about Referata, Semantic MediaWiki, and my other software (sadly, not everything was covered in the FAQs). And, maybe most importantly, WikiWorks has started taking off - we’re already working on our first projects, we have more that are still under discussion, and we have a service agreement contract, just like a real company would… also, as of a few days ago we have a blog and a Twitter account. Please follow, or comment, or whatever it is the kids are doing these days.

Announcing WikiWorks

Thursday, November 5th, 2009

I’m thrilled to announce WikiWorks - a MediaWiki-focused consulting company that I just launched. This is my first serious business venture, unless you count Referata. But it doesn’t feel like a huge leap into the unknown, because consulting is already what I do - I’ve done at least some paid MediaWiki work for dozens of sites and companies over the last few years. The difference now is the additional people - WikiWorks is a samll team of programmers around the world, all with significant experience setting up (and, in some cases, developing) MediaWiki; the goal is to make myself expendable, as it were, so projects can run smoothly even if I, or any other one person, can’t work on them at the time. We’re automating the process. Most of us also have other jobs at the moment, but these kinds of projects can almost always be done on a part-time basis, during off-hours; and in-depth projects involving full-time work, should they come, will be handleable in one way or another. The focus is on Semantic MediaWiki-based solutions, though we’re also equipped to take on regular, non-semantic projects.

So - if you’re from a company that would like to set up a wiki the right way, send us an email. If your company has a need for an easily-configured but powerful data integration system, and you would prefer software that’s free to something that costs a million dollars, send us an email. If you have too many Excel spreadsheets flying around the office, send us an email. If you already run a MediaWiki-based wiki, but want to make it nicer-looking, more user-friendly, and more like a true database application, send us an email. We’re looking forward to making some wikis.

Software, West Coast-style

Tuesday, November 3rd, 2009

I had an action-packed trip to California about a week ago. First was the 2009 Google Summer of Code Mentor Summit, which turned out to essentially be an open-source development conference, sponsored in an extremely generous way by Google. It took place at the Google campus, AKA the “Googleplex”, which I saw a long time ago back when it was the SGI campus, but now looks rather different. What can I say - for all the talk of cutbacks, it looks like Googlers still have it pretty good. The cafeteria food was so good, it made me just want to stay in the cafeteria all day.

The conference itself was quite interesting. I especially liked the talks about the non-development aspects of open source software, like the discussions on
marketing and inter-project communication (I wrote the notes for both of those sessions, which I don’t think is a coincidence because I was interested in those topics to begin with). It was eye-opening to see that every open-source project, even the established ones with foundations and business models and lots of users (all categories that potentially describe both Wikimedia and MediaWiki) struggle with the same issues of gaining “buzz” and coordinating decisions that regular software companies, for better or worse, have professionals handling.

I also got to spend with my brother and his wonderful family. And yes, I did go to this party, which was awesome (it was essentially a party full of people at various software startups, which you would never, ever see in New York); and, separately, I went to this great vegetarian restaurant as well.
After the weekend, it was time to head to the new MediaWiki office in San Francisco, where I met for two days with the members of the Wikipedia Usability Initiative. We had some very interesting and fruitful discussions, all on the subject of the template forms project, which is what I’m involved with. Lots of discussions about naming, which is always trickier than it seems!

In what really is a coincidence, earlier today I released the TemplateInfo extension, which is the first draft of my section of the work for the template-forms project. Hopefully it’ll end up on a gigantic website before too long.

Update: Oops, I forgot to post a link to the photo of all GSoC Mentor Summit attendees. Can you spot me? Hint: I’m in the back row, right next to the tree, in a blue hoodie.

Gone till December

Friday, October 23rd, 2009

Things have been busy lately, of course, and it looks they’ll stay busy for the next month and a half… in the interests of keeping people informed, and in lieu of continuous Twitter feeds, I figured I’d share my upcoming plans:

  • This weekend and part of next week, I’ll be in California for the Google Summer of Code Mentors’ Summit, and to visit the Wikimedia Foundation people again.
  • While there, I may or may not also be attending this party.
  • The weekend after that is… Halloween.
  • The weekend after that, my lovely wife and I will be jetting off to Karlsruhe, Germany for SMW Camp 2009.
  • After that’s over, we’ll be flying to a few cities in Southern Europe for a few weeks for our honeymoon. European vacay - champagne and cigarettes!
  • Then it’s time for Thanksgiving.
  • A week and a half after that, I’ll most likely be flying to Shanghai to talk about Semantic MediaWiki at the Asian Semantic Web Conference.

I’m looking forward to the honeymoon, but otherwise I don’t know how well this new role of international jet-setter fits me… hopefully 2010 will be calmer all around.

Going to the (reception hall) of love

Friday, October 2nd, 2009

I had delayed writing about this for a fairly long time, partly because I decided a while ago that this blog was going to be strictly about technical and semi-technical issues, and I stopped writing about personal stuff, pop culture things, etc. After a while I started feeling guilty that I hadn’t written about it yet, which just made it hard to write something about it, thinking about how I’d have to explain my delay in writing about it, which just made the situation worse, etc. etc.

Anyway… I’m extremely pleased to say that I’m getting married in two days. (!!) My lovely bride, who for now prefers to remain anonymous on the internet, is named Lee (that’s actually her nickname, but it’s what everyone calls her), and I’ve known her for four years, and she’s the light of my life.

For an only-adequate, somewhat-too-Photoshopped photo of the two of us, but the only reasonable one I could find on this short notice, see here.

Forms coming to Wikipedia?

Thursday, September 24th, 2009

I’m doing some part-time work for the Wikimedia Foundation now, on the usability project; you can see the first fruits of my labor here - a proposal for template-based forms on Wikipedia (this, I should note carefully, would not be using Semantic Forms). And you can see the spirited, if mostly tangential, discussion about it on the Wikipedia developers mailing list here.

Announcing Semantic Internal Objects

Thursday, August 20th, 2009

My latest extension: Semantic Internal Objects; this is either number 10 or 12, depending on how you count it; which is hard to believe. What is Semantic Internal Objects? In short, it lets you encode compound information, or what’s sometimes known as “n-ary relations“, within Semantic MediaWiki. If you want to record that, say, someone is president of a country, you can do that easily with SMW. But if you want to record that that person was president from a certain year to a certain other year, that hasn’t been possible in SMW until now, because it can’t be represented as a simple relationship (okay, actually, it has been possible, through multi-value properties, but I don’t consider those an ideal solution for various reasons). Semantic Internal Objects (SIO), in short, lets you do that, using a new parser function. I’m very excited about this extension; I think it’ll open up a lot of possibilities for various SMW-based websites, but we’ll see…

Read “Lecturing Birds on Flying”

Wednesday, August 5th, 2009

Everyone should go out and read Pablo Triana’s new book, “Lecturing Birds on Flying: Can Mathematical Theories Destroy the Financial Markets?“. And I’m not just saying that because I’m quoted in it.

Okay, I am just saying that because I’m quoted in it. But - I’m quoted in it! The quote comes early on, on page 9, and it’s a paragraph from a blog post I wrote two years ago, in a review of Nassim Taleb’s “The Black Swan“, which also contained some thoughts about my old website, Betocracy, plus prediction markets and the “wisdom of crowds” theory.  Dr. Taleb linked the blog post soon after I wrote it, on a page on his website that now appears to have been removed. That’s almost certainly where Triana read the post from, since he’s a devoted follower of Taleb’s.

I bought the book and read it, and it’s interesting - it’s an ode, in the manner of Taleb’s “The Black Swan” and “Fooled By Randomness”, to common sense in finance and a deep skepticism of “experts” who claim to have mastered the markets. Triana has the advantage of writing post-financial-crash, when the idea that the large banks were playing a con game has become standard opinion, right or wrong. He argues for it forcefully, with a focus on financial math, stating that mathematical formulations like the Black-Scholes formula and the concept of “value at risk” (VaR) are flawed and have provided cover for brazen financial gambles. More interestingly, he argues that Black-Scholes, though it’s taught as a basic rule of finance, is never actually used in the banking world; instead, traders make intuitive purchasing decisions that they then justify as some “fudge factor” on top of the supposedly set-in-stone Black-Scholes, using concepts like the “volatility smile”.

Anyway, you should all read it.

Interestingly, I found out about the book when I was called several months ago by the guy who recorded the audio version, to find out how to pronounce my name; among the stranger phone conversations I’ve had. And I guess this now means a bunch of people have now heard my name as well; you can buy the audio version here, by the way, all 16 hours of it.

External Data grows again

Tuesday, June 23rd, 2009

The latest version of the External Data extension now lets you get data from two other sources (in addition to APIs and text files): LDAP servers, and database tables. This is a nice step forward, in that it’s no longer completely necessary to create an API for every data source you want to access from the wiki; which makes the concept of using MediaWiki for data integration potentially simpler and less breakable. Thanks to David Macdonald for this new functionality.

Meeting Metaweb

Wednesday, June 17th, 2009

I had a very interesting meeting about a week and a half ago with Robert Cook, the co-founder of Metaweb, i.e. the people behind Freebase. By sheer coincidence, we know someone (non-technical) in common, and he was visiting New York, so it all worked out. I certainly learned a good amount. For one thing, it was a pleasant surprise to find out that he’s a very friendly and personable guy. The meeting also cleared up some misconceptions I had had about Freebase, and their future plans. I had always thought of Freebase and Semantic MediaWiki as rivals - friendly rivals, perhaps, but still creators of similar products, possibly competing for some of the same customers. And if Wikipedia ever started using SMW, I imagined we’d become pretty much direct competitors, since the other co-founder of Metaweb, Danny Hillis, has referred to Freebase as “Wikipedia for data”. But it turned out that, far from fearing or being skeptical Wikipedia adopting Semantic MediaWiki, Robert was very excited about the idea, and wanted to know what he could do to help. As I found out, Metaweb sees Freebase more as an aggregator of data than an original source of it (that’s my understanding, anyway). In other words, though users can directly add information to Freebase through the form interface, the much more important source is sites like Wikipedia, MusicBrainz, EDGAR, etc. Freebase’s strengths lie in matching up entities (i.e., knowing that data about a book from two different databases are about the same book), as well as querying and browsing - they have an extremely fast storage and querying system for their millions of items of data, and some slick interfaces for browsing through it all (see Parallax). So a two-part solution suggests itself: Wikipedia, with some sort of semantic capability, handles the entry and display of data, along with basic aggregation, like lists and tables (and possibly maps and timelines, etc.); while Freebase takes in the data, then handles the complex browsing and querying that Wikipedia probably couldn’t allow, for performance reasons. Other sites could allow for querying and browsing of Wikipedia’s data as well, of course, but Freebase looks like they’re in a unique position to handle it all.

There’s also Freebase’s entity match-up, which is at the heart of Freebase’s new Common Tag effort. The idea is to, instead of using plain text tags for blog posts, news articles, etc., use Freebase entity IDs instead - so that there won’t be ambiguity about what a tag means. It’ll be interesting if this initiative takes off - as Robert noted, it’s not a substitute for true semantic triples, but it beats having “an ambiguous relationship to an ambiguous entity” (my recollection of how he described current tags).

Yaron has a wish list

Sunday, December 21st, 2008

I can’t believe I haven’t mentioned yet that my Amazon wish list is up. Feel free to peruse, especially if you’re feeling charitable this holiday season toward, say, people whose software you use. As you can see, I have a definite scarcity of books about web design and open source.

Edit this interface

Wednesday, November 12th, 2008

From “Why I love MediaWiki” by Brianna Laugher:

My single favourite thing about it is the ability (for sysops at least) to edit the interface, via editing pages in the MediaWiki: namespace. It really steps back and lets the users take ownership of their wiki. It makes me wish practically all software I used had such a function. (Unfortunately the “discoverability” of this fact is still low. Your chances of figuring this out without anyone telling you would be near zero.)

Yes, in MediaWiki an administrator can change just about every piece of text that users see, just by editing the corresponding page. It’s a pretty ingenious approach to customization, though it’s true that it’s hard to find out: I actually had to re-learn that fact a few times before I really remembered it, because it’s such an unexpected feature.

Good times for Red Hat

Tuesday, November 4th, 2008

Jim Whitehurst, the president and CEO of Red Hat (a major Linux company), says that the coming economic downturn is good for the open-source business, either relatively or maybe even in absolute terms:

In August Red Hat posted second quarter revenue 29 percent higher than the same quarter a year ago, while its subscription revenue also enjoyed double-digit growth to beat analysts’ estimates. Whitehurst said that while predictions of a recession will likely mean fewer new projects, the economic benefits of going open source are already encouraging proprietary customers to switch.

“I’ve had a couple of conversations with CIOs who said ‘we’re a Microsoft shop and we don’t use any open source whatsoever, but we’re already getting pressure to reduce our operating costs and we need you to help put together a plan for us to help us use open source to reduce our costs’”.

This fits my own thoughts, that the downturn, bad as it will be in general, will make MediaWiki-based businesses, including my own, more attractive since open-source software is a much cheaper solution than that of the the Microsofts and SAPs of the world. The proprietary application that, in regular times, you don’t mind shelling out a few hundred thousand dollars a year for, plus whatever it costs to ship in a team of consultants to install and configure it, probably doesn’t make as much sense when revenues are down and you’re thinking about laying off employees.

Hello from Hellas (yes, that’s the best I could do)

Monday, July 14th, 2008

I’m currently at the airport in Athens, Greece, on a layover on the way to Alexandria, Egypt, where I’ll be taking part in the Wikimania conference. I’m definitely looking forward to it: I’ll be be speaking at two events - a workshop called “Creating the structured semantic wiki“, and a panel/workshop (I don’t know what it’ll be, exactly) on “the state of Semantic MediaWiki“, with two of the co-creators and main developers of SMW, Markus Krötzsch and Denny Vrandečić. I’m also looking forward to seeing them and many of the other important people working on MediaWiki, Wikipedia, and some related projects. Alexandria was chosen as a venue because it’s housed, since 2002, the “new Library of Alexandria”, AKA Bibliotecha Alexandrina, a library/conference center/performing-arts center that’s thematically appropriate to a wiki conference and also supposed to be very cutting-edge. The effect of the location has been to minimize the participation of Americans, but that should be offset by the presence of a lot of Europeans and British people, and of course many from the Arab and greater Middle Eastern world. I’ve never been to Egypt before, or anywhere else in Africa, and I’ve barely visited Arab countries before, so it should all be an interesting experience. I’ll write more about the conference here later - I don’t know if I’ll do frequent updates or just one end-of-the-conference wrapup, but I bet there’ll be a lot to write about in either case.

Technology updates

Thursday, May 8th, 2008

Some interesting technological improvements and news recently…

  • I released Replace Text, my latest extension, a week and a half ago. It’s a fairly minor extension, just doing a search-and-replace across the pages in a wiki, but it’s important for certain circumstances.
  • Semantic Drilldown had a big update about two weeks ago - it now supports multiple values per filter, finding the set of results that match any one of those values; some the display has been improved as well. For example, from a German site, here’s a list of countries that are either constitutional monarchies or federal republics.
  • Sergey Chernyshev has also released two fantastic extensions recently: Header Tabs, which quickly applies a tabbed interface onto any page, and Widgets, which allows for the easy placement of any widgets (like videos, slideshows, feeds and many others) onto one’s wiki. Header Tabs has already gotten a lot of usage, though I think Widgets will be the more transformative one, enabling a whole new set of functionality without any need for programming.
  • One wiki that uses Semantic MediaWiki and Semantic Forms, that’s gotten a lot of buzz recently, is Cause Caller, which is actually an application around a wiki - the wiki gives information about American politicians, while the application lets you make phone calls to those politicians’ offices, to give them your opinion on various political issues. As far as I can think of, this is the first automated usage of data from a Semantic MediaWiki-based site; i.e. where the data is used for purposes other than just reading it. Its creator also made an entertaining screencast demonstrating the wiki. I think this might be the first online video to mention Semantic Forms by name. Unfortunately it doesn’t actually show a form (maybe there are more videos to come); though Header Tabs does appear quite conspicuously. It does remind me that I should put together my own long-planned screencast…
  • Other interesting SF-based wikis that have shown up recently: The Music Snob, a resource for musicians (which also has a nice usage of Semantic Drilldown), C-Pop Fantasie, a nicely-designed site that covers Chinese pop music, and BioVenturist.com, which covers biotechnology companies, technologies and venture capitalists, all important information.

Is “the Semantic Web” a helpful term?

Thursday, April 17th, 2008

The time has come, I think, to ask whether “the semantic web” is a good term to use; even though a lot of people use it, and I even belong to a semantic-web meetup or two. The problem with it comes, I think, because it creates an incorrect view in people’s minds of a structure that will show up at some point in the future, enabling various magical abilities. The phrase creates some confusion, in that it raises some unanswered questions: what will “the semantic web” actually look like? Who will create it? And how will we know when it’s arrived?

Now, it could be that enabling more semantic export of online data will indeed have some magical effects. My issue, though, is that phrasing it in such a way makes the whole endeavor more intimidating than it needs to be, suggesting that it’s a project that has yet to even really start. In fact, semantic technologies are not only with us already, but some are in widespread use. RSS is the obvious example: it’s a widely-used file format that displays information about blog posts, news articles and the like in a machine-readable way, so that, using a feed reader, one can be instantly notified about new posts, including their title and other basic information, from any of hundreds of thousands of sources. That’s as semantic as it gets.

More generally, there’s obviously plenty of structure already in the non-semantic (”syntactic”) web. Sites that cover everything from weather to shopping to reference to news display their data in a structured way, retrieving it from relational databases. In some cases, like Amazon, APIs are provided so that one’s application can retrieve this data directly. But even if there’s no API, or other semantic export of the data, it can be retrieved anyway, through web scraping. The NewYorkNabes, which I did the programming for, is one of maybe tens of thousands of examples - it gets its real-estate-price information by going to a set of URLs on newyork.backpage.com once a week, finding the relevant prices within the HTML, and taking their median. If Backpage were to additionally publish their data in RDF form, they would be a true semantic web site, and it would be easier for my code to get that same data. But functionally, things would look exactly the same to users as they do now. You could argue that the difference is that the semantic web data would be retrievable even if the look of the site changed: web scraping is a fragile endeavor, and in theory the system can break if any part of the HTML, like just a font color, is changed. But if you think about it, the same holds true for semantic data: if the owners decide to change a property name from “Price” to “Rental price”, the system will break just as easily. Neither approach offers a full guarantee, and they both require maintenance: the difference is only one of degree, not kind.

I’ve sometimes thought that a good analogy for the value of storing data semantically is a well-organized kitchen: if a kitchen has all its tools and supplies logically arranged and in their place, then it’s easy to find any particular item, and, maybe just as importantly, to know if an item is missing, so that if you don’t see it you won’t end up spending an hour looking for it. If you walk into such a kitchen, even if you’ve never been in it before, you’ll probably be able to start cooking right away. By contrast, the regular web can be compared to a disorganized kitchen, where everything is strewn all around, mixed in haphazardly: the blender could be anywhere, and if there’s no baking soda in the kitchen, good luck determining that for sure.

But this analogy also highlights the gray area between “semantic” and “syntactic”. After all, there’s no such thing as a perfectly-organized kitchen, since any two people’s conceptions of how things should be organized will be different. If you’re looking for wine glasses, will you look near the ordinary glasses, or near the fancy plates? However you arrange things, some people still won’t be able to find what they’re looking for right away, because they’re expecting it elsewhere. Similarly, there are always ambiguities in data - to take one small example, retrieved from this fascinating list of “edit wars” that have emerged in Wikipedia over silly data ambiguities, if you’re displaying consumer products on your site, do you refer to a regular iPod as an “iPod” or (the new term) an “iPod classic”? Even the most carefully-laid-out semantic data will still need some human analysis, and “massaging” of the data, to be usable in an application, and to be aggregated with other data sources, because there’s always ambiguity or differences of opinion over how data should be structured.

But if semantic data can resemble syntactic data, the reverse is true as well. To go back to the analogy, even the messiest kitchen is still usable: if you found yourself having to use one, given enough time, you could figure out where everything is and muddle through. After a few months of working in one, you could probably accomplish everything that you could in a well-organized kitchen. It wouldn’t be nearly as enjoyable, of course, but it would be possible. The comparison can be made to a site like NewYorkNabes, which by its nature is a hack, but it works. The difference between syntactic and semantic, again, emerges as one of degree.

My point here is not that all the talk about the benefits that semantic technology like RDF and OWL will bring is overhyped: I won’t try to predict the changes that they will or won’t bring, but I would guess that there will be some substantial benefits to their adoption. I just think “semantic web” is a bad way to describe this technology, because it makes it seem like a goal to be accomplished, so that one day people can say, “the semantic web has been created”, instead of what I think is the more realistic description, which is a gradual process that began a long time ago of making data more accessible. Instead of “the semantic web”, I think I prefer the terms “semantic technology” or “semantic representation”, or even “semantic web technology”. Heck, even “Web 3.0″ is fine with me, since people understand that “Web 2.0″ is about a set of technologies and not a separate structure - an adjective, not a noun.

The web in one line

Sunday, March 16th, 2008

Appropriately, since I just mentioned them, my friend Nick’s company, which is being “incubated” by Y Combinator, just launched: Wundrbar. (There’s a German-language pun in the title, which I think is intentional, but I can’t remember now.) It displays a single command line that lets you do so-called “deep searching” (going directly to the relevant page on a website) for various sites for weather, shopping, reference, etc., as well as actions like blogging and emailing. You could compare it to YubNub, which has a similar concept, but it’s less overwhelming in its options (in my opinion) and more geared toward consumer applications. It fits in exactly with my philosophy of making the web easier to use, and I wish them best of luck with it.

Fast, cheap, etc.

Monday, March 10th, 2008

Some interesting thoughts about software development and work in general, all via 37 Signals’ Signal vs. Noise:

“Programmer happiness is the most important factor in making quality software”. I completely agree. The author calls this approach “emo programming”, which - well, I like emo the music genre, so I can’t really complain.

Six Principles for Making New Things: “find (a) simple solutions (b) to overlooked problems (c) that actually need to be solved, and (d) deliver them as informally as possible, (e) starting with a very crude version 1, then (f) iterating rapidly.” Written by Paul Graham, whose “incubator”, Y Combinator, a friend of mine is currently working at; so I hope it’s good advice. I mean, I know it’s good advice; that’s been my philosophy for a while now.

In praise of lazy.

I am a hacker

Thursday, March 6th, 2008

You can see my first-ever MediaWiki change, added earlier today right here.

Yes, Wikipedia runs on a few hundred thousand lines of code, and I wrote exactly one of those lines. You can thank me later.