The curse of the money spider? spinning the web of data

I was enamoured with the concept of a web of data when I heard Tom Coates discuss it in his excellent talk at dConstruct in 2007. The idea of moving away from silos of information – bounded by the extent of your website – to parcels of information moving between lots of websites – bounded instead by relevance – was an attractive one to me.

At the time I was working for a government heritage agency responsible for the online presence for an organisation that managed over 14.5 million archive items relating to 280,000 archaeological and architectural sites and monuments throughout Scotland.

Despite its public remit, RCAHMS had a perception of being closed and restrictive and I was keen to promote the idea within the organisation of communicating its resources outwith the confines of its website(s), as well as enabling other users to contribute their own information to RCAHMS from other contexts and social networks.

I’ll offer my ramblings on the relative success of RCAHMS’ efforts in another blog post but I wanted first to reflect on the notion of a web of data itself and whether it had any currency two years on.

The idea of a web of data is one that has come under fire in the last week or so. A regular user of twitter I was quite surprised to discover today that only a user’s last 3,200 tweets were visible through twitter. in the meantime, tr.im can no longer support its url shortening service, social bookmarking site ma.gnolia withered unexpectantly earlier in the year and we seemingly can’t even rely on flickr to conserve our much loved snaps.

Is then the idea of web of data – information, data and user-generated content moving freely between sites – still achievable? Is this a utopian, liberal ideal endorsed by those that have grown up with the open source and web standards communities? Or is it synonymous with the excesses of a pre-credit crunch world, haemorrhaging money and minutiae across the infinite possibilities of affordable digital storage?

Web applications are easily built, cheaply deployed and quickly disseminated. But rarely are the full technical and financial implications of such ventures considered in detail.

So what does this mean for the web of data? Is the movement of information now merely transient (who would be interested in what i had to say more than 3,200 tweets ago)? Should we be curating and archiving these relics of communication? Or is this web of data about me just a further signifier of excess that can no longer be sustained within the current economic climate?

Are the strands of the web robust enough to survive such losses or are we just going to end up with loads of broken links in the chain? Quite literally1.

1 Apologies for the mixed metaphor

Comments

Cole,

Interesting post and a damn good question. One of my interests is genealogy and family history, and I’ve always been hesitant to entrust a commercial service with the archival of old family pictures, etc., because I always wonder what will happen if that company goes under.

Of course, I would always, always, always keep a local backup of pictures and documents, but what about family sites where people collaborate on research? I’m a member of one such site, and the amount of information that’s built up through discussion and member submissions is incredible. It’s that sort of data I’m worried about losing.

This underscores the key feature any “storage” system on the web has to offer – usable, automatable export.

Witness Flickr. If Flickr announced they were shuttering at the end of the month, and would be offline at the end of the year, how would you export your photos? Yes, you can download them, but if you have thousands of images, are you going to download them one at a time? How long will that take?

And what about the comments? Or the tags? Or the millions of inbound links?

If you think that’s an unrealistic scenario, remember that Facebook is more popular for photo uploads than Flickr, and then ask anyone with a Geocities account how reliable Yahoo is as a storage provider.

Even magnolia, which HAD a robust export system that I used only a month before its demise (to move to delicious, in case you wondered), only protected those members who had made a habit of backing up regularly.

I use Pivotal Tracker, a free system for tracking feature requests and implementation. It provides an API for creating exports, and I have a server hit that every hour or so to pull down the latest state in case “free” turns into “very expensive” or “unsustainable” tomorrow.

When was the last time you backed up your company’s papertrail on Basecamp? Or your exported your thousands of emails from Google?

Just like on the desktop, it’s all about backup. GMail is less likely to catch fire than your office, but it’s much more likely to just not be there one day.