King's College London, 3rd - 6th July 2010

The State of Non-Traditional Authorship Attribution Studies – 2010: Some Problems and Solutions

Rudman, Joseph
Carnegie Mellon University, USA

In 1997, at the ACH-ALLC'97 conference at Queen's University, there was a session presented by R. Harald Baayen, David I. Holmes, Joe Rudman, and Fiona J. Tweedie, "The State of Authorship Attribution Studies: (1) The History and the Scope; (2) The Problems – Towards Credibility and Validity." Thirteen years have passed and well over 600 studies and papers dealing with non-traditional authorship attribution have been promulgated since that session.

This paper looks back at that session, a subsequent article published by Rudman in Computers and the Humanities, "The State of Authorship Attribution Studies: Some Problems and Solutions," and the more than 600 new publications. There are still major problems in the “science” on non-traditional authorship attribution. This paper goes on to assess the present state of the field – its successes, failures, and prospects.


It has been an exciting thirteen years with many advances. Each of the following (not a complete list) will be discussed:

  1. Arguably, the most significant development in the field is the large contingent of computer scientists that have brought their perspectives to the table – led by Shlomo Argamon, Moshe Kopple, and a host of others.
  2. The Dimacs Working Group on Developing Community.
  3. Sir Brian Vickers' London Authorship Forum.
  4. John Burrows' Busa Award.
  5. Forensic Linguistics.
  6. Successful studies such as Foster's Primary Colors work.
  7. The continuing advances of practitioners such as John Burrows, David Hoover, Matthew Jockers, David Holmes, and others.
  8. John Nerbonne's reissue of Mosteller and Wallace's Applied Bayesian and Classical Inference: The Case of "The Federalist Papers."
  9. Patrick Juola's "Ad Hoc Authorship Attribution Competition." and his NSF funded JGAAP project.
  10. The PAN Workshops. Uncovering Plagiarism, Authorship, and Social Software Misuse.


Contrary to what many practitioners of the non-traditional proclaim, there is not wide-spread acceptance of the field.

There have been many high profile problems with the concomitant negative publicity, e.g.:

  1. Foster's misattribution of A Funerall Elegie
  2. Foster's misattribution of the Jon Benét ransom note
  3. Burrows' attribution then de-attribution of “A Vision”
  4. The continuing bashing of Morton's CUSUM

Burrows' shift is something that every good scientist should do – search for errors or improvements in their experimental methodology and self correct.

Failures and Shortcomings

After thirteen years of increasing activity, there is still no consensus as to correct methodology or technique. Most authorship studies are still governed by expediency, e.g.:

  • The texts are not the correct ones but they were available
  • The controls are not complete but it would have taken too long to obtain the correct ones

The “umbrella” problem remains – most non-traditional authorship practitioners do not understand what constitutes a valid study.

Problems in the following areas will be explicated and solutions proposed:

  • Knowledge of the Field (i.e. the Bibliography) – The fact that there have been so many authorship studies is good -- the fact that they have been published in over 90 different journals makes a complete literature search time consuming and difficult which is not good. To make things even more difficult, add to this the more than 14 books, 22 chapters in books, the 80 conference papers, the 10 reports, 22 dissertations, 9 newspaper articles, the 10 on-line self published papers, 4 encyclopedia entries.
  • Reproducibility – verification
  • The Experimental Plan
  • The Primary Data – This is a major problem that is almost universally side-stepped.
  • Style markers – Function words, n-grams, etc.
  • Cross Validation – necessary but not sufficient
  • The Control Groups – Genre, gender, time frame, etc.
  • The Statistics – A range of techniques will be discussed – e.g. Neural Nets, SVM's, Sequence Kernals, Nave Bayes
  • The Presentation – visualization


In conclusion, there is a discussion of our role as gatekeepers:

  • Rudman's caution that attribution studies on the Historia Augusta are an exercise in futility.
  • Hoover and Argamon's modification and clarification of Burrows' Delta.
  • Rudman's “Ripost” of Burrows' “History of Ophelia.”
  • Should we oppose patents such as Chaski's?
  • The Daubert triangle.


