OpenAmplify
DH 2009 conference, and authorship determination

I just attended the Digital Humanities conference, which was conveniently held in Maryland this year (it generally alternates years between the US and Europe). Having been away for a few years, I was impressed with how much stronger the presentations were.  People are being much more thorough and quantitative, and grappling with tougher questions more systematically. They're also working with much larger datasets, which is encouraging.

One especially cool talk (even though it doesn't relate very directly to Semantic Web) was by Fenella French, who works on preservation of objects like the original Start-spangled banner, the Constitution, and so on. Another talk in the same session discussed using CAT scanning to view the texts of ancient scrolls that are to fragile to even unroll.

More directly relevant, there was a very strong track on authorship studies: finding ways to statistically analyze documents and seek evidence on which were written by who. This has been attempted at least since the 1920s, but most studies didn't devote the effort to test a proposed method on a range of known cases (in fairness, until recently it was very difficult/expensive to get much machine-readable text to work on).

Many such studies try to decide the authorship of older texts: manuscripts in ancient Greek, or the Federalist papers (a series of article by Alexander Hamilton, James Madison, and John Jay advocating ratification of the US Constitution, which are important sources for those who wish to understand the document). A dozen of these are of disputed authorship.

Authorship questions sometimes come up in court ("who wrote that ransom note?"). And on the Web, identifying posters in forums could be useful for moderators trying to stop a persistent user from creating new accounts after being booted (hopefully for adequate reasons!), or just to connect up threads.

Very many statistics have been used for this; but this year people were discussing how you can compare and contrast them to get better overall results, rather than focusing on one or two that they perhaps find intuitive in a particular setting. I'm looking forward to checking this out in more detail in relation to OpenAmplify.

Steve

PS: I just also posted a discussion question under "Authorship and short texts."


Posted 29 Jun 2009 8:23 AM by sderose
Filed under: ,
*You need to be signed in in order to post comments. Please register if you are not currently a member of this community.