<?xml version="1.0" encoding="UTF-8"?>
<?oxygen RNGSchema="../schema/xmod_web.rnc" type="compact"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:xmt="http://www.cch.kcl.ac.uk/xmod/tei/1.0"
    xml:id="ab-596">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>The State of Non-Traditional Authorship Attribution Studies &#x2013; 2010:
                    Some Problems and Solutions </title>
                <author>
                    <name>Rudman, Joseph</name>
                    <affiliation><orgName>Carnegie Mellon University</orgName>, <country>USA</country></affiliation>
                    <email>jr20@heps.phys.cmu.edu</email>
                </author>
            </titleStmt>
            <publicationStmt>
                <publisher>Centre for Computing in the Humanities, King's College London</publisher>
                <address>
                    <addrLine>Strand, London WC2R 2LS, England, United Kingdom. Tel:+44 (0) 20 7836 5454</addrLine>
                    <addrLine>http://www.kcl.ac.uk/cch/</addrLine>
                </address>
            </publicationStmt>
            <sourceDesc>
                <p>No source: created in electronic format.</p>
            </sourceDesc>
        </fileDesc>
        <revisionDesc>
            <change>
                <date>2010-05-16</date>
                <name>NG</name>
                <desc>CCHLite encoding</desc>
            </change>
        </revisionDesc>
    </teiHeader>
    <text type="paper">
        <body>
            <p>In 1997, at the ACH-ALLC'97 conference at Queen's University, there was a session
                presented by R. Harald Baayen, David I. Holmes, Joe Rudman, and Fiona J. Tweedie,
                &quot;The State of Authorship Attribution Studies: (1) The History and the Scope;
                (2) The Problems &#x2013; Towards Credibility and Validity.&quot; Thirteen years
                have passed and well over 600 studies and papers dealing with non-traditional
                authorship attribution have been promulgated since that session. </p>
            <p>This paper looks back at that session, a subsequent article published by Rudman in
                    <hi rend="italic">Computers and the Humanities</hi>, &quot;The State of
                Authorship Attribution Studies: Some Problems and Solutions,&quot; and the more than
                600 new publications. There are still major problems in the “science” on
                non-traditional authorship attribution. This paper goes on to assess the present
                state of the field &#x2013; its successes, failures, and prospects. </p>
            <div>
                <head>Successes</head>
                <p>It has been an exciting thirteen years with many advances. Each of the following
                    (not a complete list) will be discussed: <xmt:oList rend="arabic">
                        <item>Arguably, the most significant development in the field is the large
                            contingent of computer scientists that have brought their perspectives
                            to the table &#x2013; led by Shlomo Argamon, Moshe Kopple, and a host of
                            others. </item>
                        <item>The Dimacs Working Group on Developing Community.</item>
                        <item>Sir Brian Vickers' London Authorship Forum.</item>
                        <item>John Burrows' Busa Award.</item>
                        <item>Forensic Linguistics. </item>
                        <item>Successful studies such as Foster's <hi rend="italic"> Primary
                                Colors</hi> work.</item>
                        <item>The continuing advances of practitioners such as John Burrows, David
                            Hoover, Matthew Jockers, David Holmes, and others. </item>
                        <item>John Nerbonne's reissue of Mosteller and Wallace's <hi rend="italic"
                                >Applied Bayesian and Classical Inference: The Case of &quot;The
                                Federalist Papers.&quot;</hi>
                        </item>
                        <item>Patrick Juola's &quot;Ad Hoc Authorship Attribution Competition.&quot;
                            and his NSF funded JGAAP project. </item>
                        <item>The PAN Workshops. Uncovering Plagiarism, Authorship, and Social
                            Software Misuse. </item>
                    </xmt:oList>
                </p>
            </div>
            <div>
                <head>Acceptance</head>
                <p>Contrary to what many practitioners of the non-traditional proclaim, there is not
                    wide-spread acceptance of the field. </p>
                <p>There have been many high profile problems with the concomitant negative
                    publicity, e.g.: <xmt:oList rend="arabic">
                        <item>Foster's misattribution of <hi rend="italic">A Funerall
                            Elegie</hi></item>
                        <item>Foster's misattribution of the Jon Benét ransom note</item>
                        <item>Burrows' attribution then de-attribution of “A Vision”</item>
                        <item>The continuing bashing of Morton's CUSUM</item>
                    </xmt:oList>
                </p>
                <p>Burrows' shift is something that every good scientist should do &#x2013; search
                    for errors or improvements in their experimental methodology and self
                    correct.</p>
            </div>
            <div>
                <head>Failures and Shortcomings</head>
                <p>After thirteen years of increasing activity, there is still no consensus as to
                    correct methodology or technique. Most authorship studies are still governed by
                    expediency, e.g.: <xmt:uList>
                        <item>The texts are not the correct ones but they were available</item>
                        <item>The controls are not complete but it would have taken too long to
                            obtain the correct ones </item>
                    </xmt:uList>
                </p>
                <p>The “umbrella” problem remains &#x2013; most non-traditional authorship
                    practitioners do not understand what constitutes a valid study. </p>
                <p>Problems in the following areas will be explicated and solutions proposed: <xmt:uList>
                    <item>Knowledge of the Field (i.e. the Bibliography) &#x2013; The fact that there
                            have been so many authorship studies is good -- the fact that they have
                            been published in over 90 different journals makes a complete literature
                            search time consuming and difficult which is not good. To make things
                            even more difficult, add to this the more than 14 books, 22 chapters in
                            books, the 80 conference papers, the 10 reports, 22 dissertations, 9
                            newspaper articles, the 10 on-line self published papers, 4 encyclopedia
                            entries. </item>
                        <item>Reproducibility &#x2013; verification</item>
                        <item>The Experimental Plan</item>
                    <item>The Primary Data &#x2013; This is a major problem that is almost universally
                            side-stepped. </item>
                    <item>Style markers &#x2013; Function words, n-grams, etc. </item>
                        <item>Cross Validation &#x2013; necessary but not sufficient</item>
                    <item>The Control Groups &#x2013; Genre, gender, time frame, etc.
                        </item>
                    <item>The Statistics &#x2013; A range of techniques will be discussed &#x2013;
                                e.g. Neural Nets, SVM's, Sequence Kernals, Nave Bayes
                        </item>
                        <item>The Presentation &#x2013; visualization</item>
                    </xmt:uList>
                </p>
            </div>
            <div>
                <head>Conclusion</head>
                <p>In conclusion, there is a discussion of our role as gatekeepers: <xmt:uList>
                        <item>Rudman's caution that attribution studies on the <hi rend="italic"
                                >Historia Augusta</hi> are an exercise in futility.</item>
                        <item>Hoover and Argamon's modification and clarification of Burrows'
                            Delta.</item>
                        <item>Rudman's “Ripost” of Burrows' “History of Ophelia.”</item>
                        <item>Should we oppose patents such as Chaski's?</item>
                        <item>The Daubert triangle.</item>
                    </xmt:uList>
                </p>
            </div>
        </body>
        <back>
            <div>
                <listBibl>
                    <bibl>
                        <author>Argamon, Shlomo, et al.</author>
                        <date>2003</date>
                        <title level="a">Gender, Genre, and Writing Style in Formal Written
                            Texts</title>
                        <title level="j">Text</title>
                        <biblScope type="issue">23.3</biblScope>
                        <biblScope type="pp">321-346</biblScope>
                    </bibl>
                    <bibl>
                        <author>Baayen, Harald, Hans van Halteren, Anneke Neijt, and Fiona Tweedie. </author>
                        <date>2002</date>
                        <title level="a">An Experiment in Authorship Attribution</title>
                        <title level="m" type="proceedings">JADT 2002:6es Journées Internationales
                            d'Analyse Statistique des Données Textuelles</title>
                    </bibl>
                    <bibl>
                        <author>Brennan, Michael, and Rachel Greenstadt</author>
                        <date type="visited">July 14, 2009</date>
                        <title level="m">Practical Attacks Against Authorship Attribution
                            Techniques</title>
                        <ptr target=" http://www.cs.drexel.edu/greenie/brennan-paper.pdf"/>
                    </bibl>
                    <bibl>
                        <author>Burrows, John</author>
                        <date>2007</date>
                        <title level="a">Sarah and Henry Fielding and the Authorship of The History
                            of Ophelia: A Computational Analysis</title>
                        <title level="j">Script &amp; Print</title>
                        <biblScope type="issue">30.2</biblScope>
                        <biblScope type="pp">69-92</biblScope>
                    </bibl>
                    <bibl>
                        <author>Chung, Cindy, and James PenneBaker</author>
                        <date>2007</date>
                        <title level="a">The Psychological Functions of Function Words</title>
                        <title level="m">Social Communication</title>
                        <editor>K. Fiedler</editor>
                        <publisher>Psychology Press</publisher>
                        <pubPlace>New York</pubPlace>
                        <biblScope type="pp">343-359</biblScope>
                    </bibl>
                    <bibl>
                        <author> Feiguina, Ol'ga, and Graeme Hirst</author>
                        <date>2007</date>
                        <title level="a">Authorship Attribution for Small Texts: Literary and
                            Forensic Experiments</title>
                        <title level="m" type="proceedings">Proceedings of SIGIR '07 Workshop on
                            Plagiarism Analysis, Authorship Identification, and Near-Duplicate
                            Detection</title>
                        <name type="venue">Amsterdam</name>
                    </bibl>
                    <bibl>
                        <author>Foster, Donald W.</author>
                        <date>February 26, 1996</date>
                        <title level="m">Primary Culprit: An Analysis of a Novel of Politics</title>
                        <pubPlace>New York</pubPlace>
                        <biblScope type="pp">50-57</biblScope>
                    </bibl>
                    <bibl>
                        <author>Khosmood, Foaad, and Robert Levinson</author>
                        <date>2006</date>
                        <title level="a">Toward Unification of Source Attribution Processes and
                            Techniques</title>
                        <title level="m" type="proceedings">Proceedings of the Fifth International
                            Conference on Machine Learning and Cybernetics</title>
                        <name type="venue">Dalian</name>
                        <biblScope type="pp">4551-4556</biblScope>
                    </bibl>
                    <bibl>
                        <author>Love, Harold</author>
                        <date>2002</date>
                        <title level="m">Attributing Authorship: An Introduction</title>
                        <publisher>Cambridge University Press</publisher>
                        <pubPlace>Cambridge</pubPlace>
                    </bibl>
                    <bibl>
                        <author>Niederkorn, William S.</author>
                        <date>20 June 2002</date>
                        <title level="m">The New York Times, B1, B5</title>
                    </bibl>
                    <bibl>
                        <author>Ramyaa, Congzhou, and Khaled Rasheed</author>
                        <date>2004</date>
                        <title level="a">Using Machine Learning Techniques for Stylometry</title>
                        <title level="m" type="proceedings"> International Conference on Machine
                            Learning (MLMTA'2004)</title>
                        <name type="venue">Las Vegas</name>
                    </bibl>
                    <bibl>
                        <author>Rudman, Joseph</author>
                        <date>2007</date>
                        <title level="a">Sarah and Henry Fielding and the Authorship of "The History
                            of Ophelia": A Ripost</title>
                        <title level="j">Script &amp; Print</title>
                        <biblScope type="issue">31.3</biblScope>
                        <biblScope type="pp">147-163</biblScope>
                    </bibl>
                    <bibl>
                        <author>Solon, Lawrence M., and Peter M. Tiersma</author>
                        <date>2005</date>
                        <title level="m">Speaking of Crime</title>
                        <publisher>The University of Chicago Press</publisher>
                        <pubPlace>Chicago</pubPlace>
                    </bibl>
                    <bibl>
                        <author>Stamatatos, Efstathios, Nikos Fakotakis, and George
                            Kokkinakis</author>
                        <date>2001</date>
                        <title level="a">Automatic Text Categorization in Terms of Genre and
                            Author</title>
                        <title level="j">Computational Linguistics</title>
                        <biblScope type="issue">26.4</biblScope>
                        <biblScope type="pp">471-495</biblScope>
                    </bibl>
                    <bibl>
                        <author>Stein, Benno, et al. (eds.) </author>
                        <date>2009</date>
                        <title level="m">PAN'09</title>
                    </bibl>
                    <bibl>
                        <author>Tambouratzis, George</author>
                        <date>2001</date>
                        <title level="a">Assessing the Effectiveness of Feature Groups in Author
                            Recognition Tasks with the SOM Model</title>
                        <title level="j">IEEE Transactions on Systems, Man, and Cybernetics – Part
                            C: Applications and Reviews</title>
                        <biblScope type="issue">36.2</biblScope>
                        <biblScope type="pp">249-259</biblScope>
                    </bibl>
                    <bibl>
                        <author>Van Halteren, Hans</author>
                        <date>2004</date>
                        <title level="a">Linguistic Profiling for Authorship Recognition and
                            Verification</title>
                        <title level="m" type="proceedings">42nd Annual Meeting of the Association
                            for Computational Linguistics</title>
                        <name type="venue">Barcelona</name>
                    </bibl>
                </listBibl>
            </div>
        </back>
    </text>
</TEI>
