Tuesday, July 31, 2012

The Wide World of Physics

I've been thinking more than usual lately about spatially representing the data in the various Bookworm browsers.

So in this post, I want to do two things:

First, give a quick overview of the geography of the ArXiv. This is interesting in itself--the ArXiv is the most comprehensive source of scientific papers for physics and mathematics, and plays a substantial role in some other fields. And it's good for me going forward, as a way to build up some code that can be used on other collections.

Second, to put some code online. I've been doing most of my work lately--writing as well as coding--in RStudio using Yihui Xie's fantastic Knitr package. The idea is to combine code with text to allow, simultaneously, literate programming and reproducible research. Blogger is pain: but all the source and text for this post is up at the Rpubs site, which is a very interesting project encouraging sharing research. You can go read this post there instead of here if you want code, but there are a few small changes. And the youtube clip is only available here.

The basic idea--to jump ahead a bit--is that it might be useful to create charts like the following, which show differing geographical patterns of usage. (Here, people talk about Harvard near Harvard, and Stanford near Stanford--but in Europe, Stanford seems to win out near the big particle physics projects in Italy and Switzerland.)

Click to enlarge
How we do that--and what we get from it--are both a little tricky.

Thursday, July 12, 2012

Making and publishing history in the Civil War

A follow up on my post from yesterday about whether there's more history published in times of revolution. I was saying that I thought the dataset Google uses must be counting documents of historical importance as history: because libraries tend to shelve in a way that conflates things that are about history and things that are history.

I realized after posting that the first of the two graphs in Michael Witmore and Robin Valenza's post actually shows a spike in publications of US history somewhere near 1860. (It actually looks closer to the late 1850s, but there aren't any grid lines on the chart.) Bookworm is pretty much useless in the 17th century, but it's on solid ground in the 1860s. And I've long known there was something funny going in Bookworm around the Civil War, particularly in the History class.

So--is there more history published in the Civil War period in the Bookworm database? What kind?

Wednesday, July 11, 2012

Do revolutionaries really read history?

A quick post about other people's data, when I should be getting mine in order:

[Edit--I have a new post here with some concrete examples from the US Civil War of the pattern described in this post]

Michael Witmore and Robin Valenza have a post up on the Wine Dark Sea about how the kinds of books that are published can give us fascinating windows on the intellectual climate in moments of historical change. I (of course) agree strongly with this. But I want to offer an alternative, and somewhat deflating, interpretation of the central evidence they use.

Their post uses the following plot (presented by Google's Jon Orwant at a meeting with humanists) as evidence that more books about history are published (and therefore read--a difficult but not completely unwarranted leap) in periods of great revolutionary change. This jumps out, particularly,  at the English and French revolutions. The chart shows this in "general and old world history":


Joe Adelman suggests a number of problems with using book publication as a metric: several are accurate. I could offer a few more questions (eg: where's 1848?); but none would unsettle the central point. It would be, as Witmore and Valenza say, very interesting if "publishers are offering more history for readers who, perhaps, think of themselves as living through important historical changes." Even if only in those two periods.

My guess, though, is that we're seeing an artifact of data here, and not history. Here's why: