I just attended the Digital Humanities conference, which was
conveniently held in Maryland this year (it generally alternates years
between the US and Europe). Having been away for a few years, I was
impressed with how much stronger the presentations were. People are
being much more thorough and quantitative, and grappling with tougher
questions more systematically. They're also working with much larger
datasets, which is encouraging.
One especially cool talk (even though it doesn't relate very
directly to Semantic Web) was by Fenella French, who works on
preservation of objects like the original Start-spangled banner, the
Constitution, and so on. Another talk in the same session discussed
using CAT scanning to view the texts of ancient scrolls that are to
fragile to even unroll.
More directly relevant, there was a very strong track on authorship
studies: finding ways to statistically analyze documents and seek
evidence on which were written by who. This has been attempted at least
since the 1920s, but most studies didn't devote the effort to test a
proposed method on a range of known cases (in fairness, until recently
it was very difficult/expensive to get much machine-readable text to
work on).
Many such studies try to decide the authorship of older texts:
manuscripts in ancient Greek, or the Federalist papers (a series of
article by Alexander Hamilton, James Madison, and John Jay advocating
ratification of the US Constitution, which are important sources for
those who wish to understand the document). A dozen of these are of disputed
authorship.
Authorship questions sometimes come up in court ("who wrote that
ransom note?"). And on the Web, identifying posters in forums could be
useful for moderators trying to stop a persistent user from creating
new accounts after being booted (hopefully for adequate reasons!), or
just to connect up threads.
Very many statistics have been used for this; but this year people
were discussing how you can compare and contrast them to get better
overall results, rather than focusing on one or two that they perhaps
find intuitive in a particular setting. I'm looking forward to checking
this out in more detail in relation to OpenAmplify.
Steve
PS: I just also posted a discussion question under "Authorship and short texts."
Posted
29 Jun 2009 8:23 AM
by
sderose