Tuesday, February 12, 2013

Mapping Spokane's Dead: A Pedagogical Experiment in Flash-mob Data Visualization

I taught Digital History last quarter. The course was a lot of fun as a dozen traditionally-trained MA students and I explored some of the new digital historical landscapes. The class is divided between readings discussions and a weekly "make" where we bust open some new digital tool and see what we can do with it. For our weekly make a few months ago we created, populated, and visualized a historical database in about an hour. This post is about what we did and how we did it.

I had a couple of different pedagogical goals.First, I wanted my students to understand the importance and power of constructing a database, rather than merely building a website. Second, I wanted them to explore one tool for doing so, Google Fusion Tables. Third,.I wanted to use some of the rich historical resources of my employer, the Washington State Archives, Digital Archives. Finally I wanted them to experience the difficulty, decisions and compromises of building a database and extracting metadata from handwritten historical documents.

Working with my grad student, the excellent Lee Nilsson, we chose the Spokane County Death Returns.1888-1907 as our data set. These records were interesting, the images of the death certificates and some metadata were already online, they represented a broad cross-section of the population of early Spokane, and they presented certain complications as well, in terms of handwriting and uneven data. Here is a sample death return from the collection, that of the unfortunate Owen J. Jones:

When the State Archives originally scanned and indexed these records, they chose to record as the metadata the first and last names, age at death, and place of death. This was a good start, but missed some data that was recorded on the death certificates and that historians would find important. So we added race, occupation, place of birth and cause of death to the metadata fields that we wanted to capture.

Then Nilsson created a Google Fusion table with our metadata fields and entered the information from a few death returns. Right away he realized that one problem the students would run into was transcribing the causes of death--things like phthisis (tuberculosis of the lungs) and morasumus (malnutrition) written in sometimes terrible 19th-century handwriting. A quick Google found us some lists of 19th century causes of death. Nilsson added about 150 names from the 1880 death index to the Google Fusion table and used color highlighting to organize the list in groups of ten names. I took the email addresses of the students in the class and gave them permission to edit the table.

The actual lesson took about an hour. We gave each student ten names and asked them to read the death certificates and to add the metadata to the table. Nilsson and I circulated in the classroom to help people out.   The names and dates went pretty smoothly. The students stumbled with causes of death at first, but the guides to 19th century causes of death cleared up most questions. A bigger problem was missing data. 1880 was the first year of death records in Spokane and the record keeping was erratic. Places of birth, causes of death, and other items were not always filled out.

Then came the experiment part--visualizing the data. My original inspiration for the project was the idea that we would produce a map of where in Spokane people had died. This did not work so well:

In retrospect the reasons are clear. A few of the death certificates gave street addresses, and Google was able to map these accurately. In other cases no place was given, or given only as Spokane. In some cases there was a location but the student transcribing was unable to read it or just did not bother. With better instruction and monitoring this map might have come out better.

Much more satisfying was this map of where the people were born. It nicely illustrates patterns of migration into 19th century Spokane. Don't miss the guy from China!

Google Fusion tables allow for other types of visualization as well. Here is a pie graph of causes of death:

Those are some mighty thin slices of pie! The chart does not tell us very much, except that typhoid and pneumonia were common. We would have been better off creating categories--contagious disease, accidents, infant death, etc.

You can also make bar graphs--here is one for the occupations of the deceased. The laboring classes seem to have had it bad in early Spokane, though you would need some demographic analysis to make any conclusions here:

Finally, here is the a frequency showing the distribution of deaths throughout the year. The table is interactive, you can click and drag to explore it. Notice anything odd? The Grim Reaper seems to have forgotten Spokane entirely some months:

The students were quite puzzled by this and came up with all sorts of reasons that several months could have passed without a death. Of course the most likely explanation is simply that the records for those months have gotten lost in the century since they were initially recorded.

All in all this pedagogical experiment was a great success. My students learned how digital sausage is made--the decisions that go into choosing what metadata to record and visualize, the challenges of working with hand-written 19th-century documents, the amount of pain-staking work that went into a data visualization.