Google's Dead Sea Scrolls Project Puts Papyrus in the Cloud

Tuesday, 27 September 2011 17:23

Google's Dead Sea Scrolls Project Puts Papyrus in the Cloud

E-mail

Rate this item

(0 votes)

In bringing the Dead Sea Scrolls into the cloud via an interface so user-friendly that even a humanities professor can navigate it, Google has once again played a part in something wonderful for the world. And I don’t mean “wonderful” in the modern web sense—like the way that Facebook has put me back in touch with friends from middle school, or that Twitter has turned one corner of my monitor into a global chatroom featuring most of the people that interest me. No, I mean something that will potentially advance the cause of historical scholarship in a way that hasn’t been done in about 200 years. Here’s why Google’s Dead Sea Scrolls project and efforts like it matter for all of us.

Textual scholarship: still trapped in the 19th century

In a not-so-distant past life, I was a University of Chicago dangling humanities PhD, specializing in early Christian and Second Temple Jewish apocalyptic literature. I spent five years at Harvard (for my two Masters) and five years at Chicago immersed in dead languages, paleography, papyrology, and related textual disciplines, and at the time I left the academy in 2008 I had all but given up on practicing history. Instead, I decided that my true calling was to make tools that historians could use, because the tools that I had been trained to use as a historian were typically between one and two centuries old.

When I say “tools,” I’m talking about monumental works of scholarship, like the Greek-English Lexicon of Lidell and Scott; Smyth’s Greek Grammar; or the Nestle-Aland Greek New Testament. Even the modern library as a scholarly tool would be quite usable by an unfrozen humanities scholar from the year 1800. Sure, these tools and others like them have evolved over the decades, as new editors incorporate new scholarship into new editions. But as types of tools, the lexicon, the grammar, and the critical edition are still maintained and used much as they have been for two hundred years.

Even ambitious modern attempts to build advanced tools for the humanities—and here I’m thinking specifically of the Thesaurus Linguae Graecae and the Perseus Project rest quite directly on foundations laid one or two hundred years ago. The TLG is just a searchable digital archive of Greek texts that indexes a body of transcriptions that themselves were made using very old tools and transcriptions. And the Perseus Project is largely an interface for accessing the LSJ and related texts, tools, and transcriptions from a much earlier era; the one truly new tool in the toolbox is morphological tagging.

(There are a ton of reasons why things have progressed so little in the past 200 years, but that rant will have to wait for another post on another day in another blog.)

The transcription and transliteration mess

The worst problem with modern textual scholarship isn’t in the tools themselves—rather, it’s in the transcriptions on which the updates to the aforementioned tools are based. First off, it’s rare that scholars get to compare a a high-quality, full-color facsimile of a source text to the edited critical edition that forms the basis of their work. But what’s even rarer is the opportunity to compare a high-quality image of a source text to the transliteration and/or transcription that underlies the critical edition. (A transliteration is where a scholar tries to copy the source text exactly, misspellings and all; a transcription is a cleaned up version of the transliteration, where spelling, punctuation, diacriticals, and the like are all normalized.) Transcriptions and transliterations are almost never released; all scholars see is the resulting, cleaned-up edition.

During my ten years of grad school, I had occasion to compare a number of scholarly transcriptions (and a few transliterations) to images of sources, and I, my classmates, and my professors were routinely shocked at the parts of their work that scholars failed to flag as questionable. There would be words and passages that clearly should’ve been marked in the transcription with a dot under them, to signify that the text was pretty much illegible and the scholar was just guessing, but they weren’t. And these transcriptions were, by and large, the work of a very small pool of scholars. In all, I was left with the impression that most modern scholarship rests on a very thin, questionable base of transcriptional effort that is totally unexamined and, indeed, not even available for examination.

What’s needed, then, is to go all the way back to the original documents, and begin work from there. The Dead Sea Scrolls project is the kind of thing that will let us do that.

Only one way forward

The only way forward out of this mess is to proceed in two phases, the first of which is to bring primary sources themselves online in a way that gives scholars easy and totally open access to high-resolution, multi-spectral images. That’s the first step, and Google isn’t at by any means the first institution to have a go at it. Scholars have been trying this since the dawn of the web, by uploading large image files to online depositories. More recently, for instance, there’s The Rare Book Room, a digital archiving labor that now includes some 400 manuscripts photographed at resolutions as high as 200MB per page, with works from Shakespeare’s original folios to Galileo and Copernicus in the original.

There are also other tools out there that let you zoom in and navigate around primary source texts, but the Dead Sea Scrolls tool is one of the nicest that I’ve seen. So Google should keep doing these kinds of projects, taking what they’ve learned at every step and applying it to the growing mass of high-res text captures that’s already out there.

And there is definitely way more material out there looking for a way to get online in an era of university and non-profit budget cutting. For the better part of a decade, special collections departments at libraries across the country have been involved in high-resolution digitization efforts of old texts and pictures in their possession, so there’s a ton of data out there that has already been generated. But before the cloud, there was no cheap way to store it and make it universally accessible online. Now that Google can store this data and make it easily accessible with an interface like the DSS project, they should reach out to libraries and start taking in their data.

The second phase—and this is critical—is to offer some kind of open mechanism for letting scholars (and later, the interested public) attach metadata and conversation to the different layers (raw image, color-corrected image, multi-spectral image, transliteration, transcription) of the online text. The main tool that I’m aware of that’s explicitly designed to enable a community to mark up a document in something like the manner that will be needed is Bobby Fishkin’s ReframeIt, but there may be others that work similarly. At any rate, if scholars cant edit and extend the metadata, and if they can’t easily attach conversation to specific points and layers in the the text, then the first phase will be a waste.

Given the rise of the cloud and of the social web in the last few years, I’m more optimistic than ever before that textual scholarship will soon be empowered to return to directly to the primary sources, and to generate a new wave of new-from-the-ground-up tools and methods of the kind that hasn’t been seen since the 19th century. And that in turn will generate a new wave of scholarship that’s anchored in better fundamentals and is more open to the non-specialist public.

Authors:

Read 3739 times

Published in News Technologique-Tech News

More in this category: « Kiddie Car Racing for Adults Best Gaming CPUs For The Money: September 2011 »