Tuesday, January 27, 2009

Are We Losing Our History in the Digital Age?

Time for a scare-report! British Library warns of 'black hole' in history if websites and digital files are not preserved: "Historians face a ‘black hole’ of lost information if we do not preserve websites and other digital records, the head of the British Library warned today. Chief executive Lynne Brindley said our cultural heritage is at risk as the internet evolves and technologies become obsolete."

Well, maybe. The article underestimates the efforts already underway to preserve at least some digital records. There is the Internet Archive (Wikipedia article) which maintains a huge cache of expired webpages. (The Wayback Machine is invaluable for recovering information when you hit an expired link.)

And of course there is the magnificent Washington State Digital Archives, my employer. We preserve the websites of former Washington governors Mike Lowery and Gary Locke among others.

The other problem with the "black hole" argument is that it compares the spotty preservation of digital records to an imaginary paper past where every record was lovingly archived and preserved in climate controlled isolation. But every historian learns soon enough that huge chunks of our historical record are missing. Twain's articles in the Territorial Enterprise are gone, burned up with the rest of the archives in an 1875 fire. A 1973 fire in Saint Louis destroyed 16-18 million military personnel files dating back to 1912. The Library at Alexandria was burned.

And yet we have histories of all these times and people. I would dearly love to be able to read all of Twain's articles as a fledgling journalist--but the handful that survive, Twain's own accounts of his Nevada years, and other primary sources from the period give us a pretty clear idea of what was happening in the to Virginia City and to Twain in those years. Future historians will find records enough for writing their histories of the early 2000s.

[Burning paper image from Flickr user The Shifted Librarian and used via a Creative Commons license. I added the wise-ass text using Picasa 3. This story is also being discussed over at Metafilter.]

1 comment:

peacay said...

Ah damn, didn't see that mefi post. I have been making tentative enquiries just recently on this subject.

There's a site (no, not mine) that is, in my opinion, a cultural gem. It has closed up shop and the site owner (it's a blog, image heavy, by a mefite) is happy to keep hosting for a few years yet.

My idea was to try and arrange for some way for the site to be preserved and in discussions with a few relevant people (which included my citing that Brit Libray piece, incidentally a very lightweight treatment of this subject but I digress) I was disturbed to learn that it just isn't done. Oh, the Lib of Congress and the Aussie Nat Lib both do a bit of website preservation but is piecemeal really. Archive.org is inefficient and seemingly directionless, although I confess to not knowing too much about how decisions about what sites/parts of sites are archived not to mention the choice as to whether images are stored or not.

The Washington site you mention I haven't seen but I presume it's a good thing for the locality. But to my mind we either have to instigate some UN body to create and administer an international archive to which, say, countries or thematic groupings (art, science etc) can nominate a certain amount of data or websites or whatever every year.

We need science bloggers making recommendations about what science blogs or blog entries are worth keeping in some sort of central warehouse. Washington may be good for Washington, but a lot (the majority?) of web material is not bound or determined or defined geographically.

Alternatively 'we' need to agitate for charitable foundations to step in. Carnegie grants (or whatever...named pulled from arse as example) to establish artistic site preservation endowment or the somesuch.

As it stands we only really have archive.org's waybackmachine for the website material. That's just not good enough imho. The site that I want preserved is of significance on a couple of fronts that kind of destroys (partly) the argument that says that halfway operable waybackmachine samplings are ok. The site is going to be an exemplar in 5, 10, 50 years for just how blogging was able to progress and be elevated to the types of dedicated broadcast mediums that we both have -vs- the humble diary like beginnings of the blog as publication. It also has an invaluable collection of artistic images not available elsewhere on the web (true, some are in contravention of copyright but that's kind of a separate issue). So snippets of this blog as found on waybackmachine for instance are not going to demonstrate what a superlative site it was (is!). Yes, it's a special case (imho! yes of course, but I only use it - giornale nuovo - as an example) but it is a useful one because any systems put forward need to be established with the idea of how best/easiest it is to preserve the cream such as giornale. Any ideas for systems shouldn't be predicated on driftnet fishing criteria such as with the waybackmachine.
(sorry for rambling and any incoherence)