Sunday, June 16, 2013

Open Letter to the Historians of the 22nd Century: Sorry for All the Stuff

So today I read one of the sillier posts I have ever seen on an academic blog, in which the author bemoans an imagined decline of diary-keeping in the present era. Future historians will have nothing to write about! The author urges us to sharpen our quill pens, pull out a piece of vellum and begin journaling. "This is your duty," he proclaims. "Create that thing that historians crave—real, firsthand accounts."

The silliness, of course, stems from the fact that the author is using a blog on the internet to complain that no one is writing about their lives anymore.

The truth is precisely the opposite--we are living through an explosion of personal writing and documentation that is unprecedented in human history. Over a billion people are on Facebook, posting about their days, complete with pictures. Half a billion are on Twitter. There are tens of millions of blogs. And let’s throw Instagram and similar services into the mix. And of course there is email–which is indeed being saved, as recent news revelations about the NSA should reassure all of us.

Cats, in particular, are being documented to an amazing extent.

And before anyone sets up the straw man of comparing your aunt Edna’s Facebook updates (“Doubled the prune juice this morning–will let you all know how it works!”) to Mary Chestnut’s diary, let me point out how extremely atypical are the handful of historian’s favorite diaries. As every working historian has come to realize, for every Mary Chestnut or George Templeton Strong, there are a hundred surviving diaries of stoic Norwegian farmers or busy mill workers are that are considerably less than illuminating:

Noviember 2, 1863: rained.
Novermber 3, 1863: Raind
Nobember 5, 1863: Cow dyed.
Novebmer 8, 1863: Did'nt rain

In contrast, future historians of my era (for whom this post is written) will have information sprayed at them with a fire hose.

Imagine if Thomas Jefferson had a Facebook page, commented on pages of his FB friends, tweeted (“Tip to fellow planters: Increase farm income w/ a nail factory manned by young slaves! #slavery #childlabor #Monticello”), and an Instagram (“Used the sepia filter for this pic of Sally on a bearskin rug. #naughty”). Plus all of his contemporaries FBing and tweeting about Jefferson! We would have vastly more information about the man. And Lord knows there is no shortage of primary source information about Jefferson as is.

The real revolution in personal writing and documentation for our era, however, is the way that it will illuminate the lives of we peasants. Every fry cook at McDonald’s has a Facebook page. And as I hinted above, it is not just that people are writing more than ever before. Future historians will have hundreds of millions of images (from Instagram alone) of people's daily lives. Add tens of millions of videos. And then there is the metadata--GPS locations for those posts and images, networks of friends and sharing, tags and hastags! For broader context the 22nd-century historian will pull up an archived Google Street View of the neighborhood, see what cable services the subject subscribed to, and peruse old Amazon order histories and wish lists.

At this point I can hear some of my contemporary readers (those nattering nabobs of negativity) saying "Not so fast--most of these sources you name are commercial services, with privacy policies and limited sharing. It isn't like you can just pull up the Facebook pages of all Cleveland fry cooks and sort by the text 'salmonella.' And how many of these records will even exist in 100 years?"

Not yet, you can't, but you will be able to. My 22nd-century readers will of course be aware of the Steve Jobs Personal Privacy Elimination Act of 2037, but even readers of my own era should know about the rapid erosion of privacy. Even before the phenomena became apparent, there was a general principle known as the "75 year rule" that most government documents became public after that amount of time. And as for the saving of old Facebook posts and the like, data is money, and data is security, and storage costs continue to fall like a stone. Our LOL Cats are safe for eternity.

So to all those future historians who stumble across this blog post long after I am dead: Sorry for all the stuff. I know you people are going to have unimaginable tools for sorting, thinning, combining and analyze the mountain of "real, first-hand accounts" that my generation has been thoughtlessly creating. Still, I know that on some days you must grow weary of examining the 746 thousand variations on a single meme. You must sometimes think "Stupid dead person, when your hard drive gets full don't just buy a bigger backup, sort your damn files!" You must spend days reading the Facebook feed of some 13-year-old who later became famous and feel despair.

Sorry about that, historians of the 22nd century. I am sorry that I made so many blog posts featuring someone else's YouTube video. Sorry that so many of my Facebook updates are vacuous. Sorry that  my Tweets bring down the tenor of the entire medium. Sorry about all of my files undescriptively labeled DSCimage987234534.jpg and GrantProposal2,docx. Sorry for the mess.

I did make you a present, though:


Unknown said...

This is overly optimistic.

1) Storage costs will not 'fall like a stone' for much longer. Costs are already off the expected curve that Kryder's Law plots, and will continue to deviate further and further over time. 'Save it all forever' is already an impossibility, and will only become more and more detached from reality over time.

Good reading on the topic:

2) Most user-generated data is in walled-off environments (Facebook and such) which are, for any real purposes, impossible to access. The track record of such companies willingly handing their holdings for long-term preservation/access is rather miserable - MySpace obliterated an unknown number of blogs last week (with no warning) as part of a 'feature rollout.'

And, of course, 'acquired by Yahoo' is paramount to 'doused in gasoline and prepared for the incinerator.' There's good reason why archivists were in a frenzy when Tumblr was acquired last month.

For cases like Facebook, the chances of its material lasting even twenty years is functionally zero. If the service can't be scraped from the outside, and there's no real chance of it being preserved from the inside, that doesn't leave much room for hope.

Larry Cebula said...

Thanks Alexander for your thoughtful reply and for the link. It is very possible that I am in fact too optimistic. (Though I would like to see your equation that leads you to state with such precision that the chances of FB material surviving are zero.)

Don't forget that you can download your FB data. Hardly anyone does because you cannot actually do anything with the data--but if FB were to announce a shut down tomorrow, lots of people would do so. And someone would come up with a tool to use this data.

If even a few percent of the current flood of personal journaling survives, it will be massively more primary source information than we have for any previous generation of human history.

Unknown said...

This is great. I just stumbled upon, your blog and the Charlie LeDuff video drew me in immediately. (I had just heard of him a few days ago!) I have heard this same complaint about our lack of diary keeping from other historians. Your point that we people of the present-era document the hell out of even moment of our lives in real time is spot on.You may be overly optimistic about the ability of future historians to access to the material, but who knows? If the battles that have already begun over control and access to social media data,emails, etc. mean anything, its that this stuff has value. I wouldn't be surprised if someone, or some entity is (secretly) collecting a good deal of it, but again who really knows? Either way, I agree that future historians will have a more than sufficient amount of primary source material to tell stories of the ordinary tweeting, instagramming, blogging folks of the early-21st century.

Here is an example of what that might look like:

Unknown said...

As an archival student, I have serious doubts that all of this stuff will survive for 22nd Century historians. I see only a very small fraction surviving (5% at the very most). Even in the traditional world of paper records, only around 5-10% is ever kept in an archives. . . And there are a whole lot more digital records that have to be dealt with.

There is also the problem that the records being created now contain less information than they used to. Instead of one detailed letter containing all of the relevant information, transactions are carried out through a long series of emails. To make matters worse, the truly useful information is buried amongst mountains of emails saying "hey, lunch @ 1?" or "oops, forgot the attachment."

We definitely owe 22nd Century historians an apology, but I think it should be more along the lines of "sorry we created more than we could ever possibly handle and probably lost a lot of really useful stuff along the way."

Larry Cebula said...

Hi Morgan!

We are all really guessing on how much will survive, aren't we? I tend to believe the percentage will be higher than people realize, due to both the falling costs of storage and the growing value of big data. But I don't know that, it is just my guess.

But suppose we take your low estimate of 5%--that is still more information about people than we have ever had for any previous generation. There is gonna be a lot of stuff.

Unknown said...

Hi Dr. Cebula! It's been a long time. I hope all is going well for you.

Yes, it really is a guessing game at this point. There are a few facts to base our guesses on, though. I realize now that my initial guess was short-sighted and a bit too negative. Your predictions are most likely what will happen in the long run. Please let me take another stab at this:

Storage costs are falling and people are realizing just how important big data and digital records really are. Most people have realized that something must be done. Unfortunately, most people aren’t sure how to do it yet. Digital records are all too often left in disarray and aren't properly backed up because many people just don't know how to organize and preserve them. Records managers are getting businesses and organizations sorted out, though!

Another huge problem right now is obsolescence. With the rapid change of technology and file formats - combined with a chronic lack of funding, time, and resources to migrate everything - many records end up left behind and unreadable. I’m still not sure if or when this will end, but I hope so. At the very least, it will lessen.

I admit, my outlook for the foreseeable future is still looking a little gloomy. Luckily, the foreseeable future is quite short! Our own archival technology and understanding of the digital world is improving rapidly and we are beginning to see the clouds lift.

The number of records managers is exploding, which is a huge benefit. And archivists are diligently working on solutions to the preservation and accessibility problems that come along with digital records. I have no doubts that we will get it all figured out. Things are progressing in our favor. I’m not sure when all the pieces of the puzzle will come together, but I know it’s just a matter of time.

I still think 5% is about right based on the percentage of physical records we keep, but you are absolutely correct about the quantity; 5% of all the digital records will be a far greater quantity of records than we have ever had before. And between the records managers and us archivists (or archivist-in-training, in my case), we can ensure that it is the most useful 5% possible. The future is quite a lot brighter than my first guess predicted.

P.S. I’m sorry to be so long-winded and I hope I’m not rambling. You know this isn’t normally like me.

509freckles said...

If you care to see this point in action in the present day, consider that the Library of Congress archives twitter and has done since 2006.