Dissertations / Theses: 'Authorship attribution'

1

Calarota, Gabriele. "On Authorship Attribution." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22809/.

Full text

Abstract:

Authorship attribution is the process of identifying the author of a given text and from the machine learning perspective, it can be seen as a classification problem. In the literature, there are a lot of classification methods for which feature extraction techniques are conducted. In this thesis, we explore information retrieval techniques such as Doc2Vec and other useful feature selection and extraction techniques for a given text with different classifiers. The main purpose of this work is to lay the foundations of feature extraction techniques in authorship attribution. At the end of this work, we show how we compared our results with related works and how we managed to improve, to the best of our knowledge, the results on a particular dataset, very known in this field.

APA, Harvard, Vancouver, ISO, and other styles

2

Honaker, Randale J. "Novel topic authorship attribution." Thesis, Monterey, California. Naval Postgraduate School, 2011. http://hdl.handle.net/10945/5761.

Full text

Abstract:

Approved for public release; distribution is unlimited
The practice of using statistical models in predicting authorship (so-called author-attribution models) is long established. Several recent authorship attribution studies have indicated that topic-specific cues impact author-attribution machine learning models. The arrival of new topics should be anticipated rather than ignored in an author attribution evaluation methodology; a model that relies heavily on topic cues will be problematic in deployment settings where novel topics are common. In order to effectively deal with novel topics, we create author and topic vectors and attempt to project out the topic influences from each document. Although our experiments did not validate our assumptions, they do point out a possible problem with a common assumption in authorship attribution research.

APA, Harvard, Vancouver, ISO, and other styles

3

Lalla, Himal. "E-mail forensic authorship attribution." Thesis, University of Fort Hare, 2010. http://hdl.handle.net/10353/360.

Full text

Abstract:

E-mails have become the standard for business as well as personal communication. The inherent security risks within e-mail communication present the problem of anonymity. If an author of an e-mail is not known, the digital forensic investigator needs to determine the authorship of the e-mail using a process that has not been standardised in the e-mail forensic field. This research project examines many problems associated with e-mail communication and the digital forensic domain; more specifically e-mail forensic investigations, and the recovery of legally admissible evidence to be presented in a court of law. The Research Methodology utilised a comprehensive literature review in combination with Design Science which results in the development of an artifact through intensive research. The Proposed E-Mail Forensic Methodology is based on the most current digital forensic investigation process and further validation of the process was established via expert reviews. The opinions of the digital forensic experts were an integral portion of the validation process which adds to the credibility of the study. This was performed through the aid of the Delphi technique. This Proposed E-Mail Forensic Methodology adopts a standardised investigation process applied to an e-mail investigation and takes into account the South African perspective by incorporating various checks with the laws and legislation. By following the Proposed E-mail Forensic Methodology, e-mail forensic investigators can produce evidence that is legally admissible in a court of law.

APA, Harvard, Vancouver, ISO, and other styles

4

Gerritsen, Corey M. (Corey Metcalf) 1979. "Authorship attribution using lexical attraction." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87414.

Full text

Abstract:

Thesis (M.Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.
Includes bibliographical references (p. 56-57).
by Corey M. Gerritsen.
M.Eng.and S.B.

APA, Harvard, Vancouver, ISO, and other styles

5

Tennyson, Matthew Francis. "Authorship Attribution of Source Code." NSUWorks, 2013. http://nsuworks.nova.edu/gscis_etd/322.

Full text

Abstract:

Authorship attribution of source code is the task of deciding who wrote a program, given its source code. Applications include software forensics, plagiarism detection, and determining software ownership. A number of methods for the authorship attribution of source code have been presented in the past. A review of those existing methods is presented, while focusing on the two state-of-the-art methods: SCAP and Burrows. The primary goal was to develop a new method for authorship attribution of source code that is even more effective than the current state-of-the-art methods. Toward that end, a comparative study of the methods was performed in order to determine their relative effectiveness and establish a baseline. A suitable set of test data was also established in a manner intended to support the vision of a universal data set suitable for standard use in authorship attribution experiments. A data set was chosen consisting of 7,231 open-source and textbook programs written in C++ and Java by thirty unique authors. The baseline study showed both the Burrows and SCAP methods were indeed state-of-the-art. The Burrows method correctly attributed 89% of all documents, while the SCAP method correctly attributed 95%. The Burrows method inherently anonymizes the data by stripping all comments and string literals, while the SCAP method does not. So the methods were also compared using anonymized data. The SCAP method correctly attributed 91% of the anonymized documents, compared to 89% by Burrows. The Burrows method was improved in two ways: the set of features used to represent programs was updated and the similarity metric was updated. As a result, the improved method successfully attributed nearly 94% of all documents, compared to 89% attributed in the baseline. The SCAP method was also improved in two ways: the technique used to anonymize documents was changed and the amount of information retained in the source code author profiles was determined differently. As a result, the improved method successfully attributed 97% of anonymized documents and 98% of non-anonymized documents, compared to 91% and 95% that were attributed in the baseline, respectively. The two improved methods were used to create an ensemble method based on the Bayes optimal classifier. The ensemble method successfully attributed nearly 99% of all documents in the data set.

APA, Harvard, Vancouver, ISO, and other styles

6

Grant, T. D. "Authorship attribution in a forensic context." Thesis, University of Birmingham, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.529439.

Full text

Abstract:

This thesis develops a quantitative method for forensic authorship attribution. The principal constraint, that the method be scientific according to the Daubert criteria, necessitates that the conclusions drawn about authorship problems must be made to a known degree of certainty. In response, the theoretical part of the thesis establishes the criteria for a sound method in authorship attribution as relying on valid, reliable markers of authorship and the development of an explicit and specific sampling strategy. The main empirical part of the thesis draws potential markers of authorship from the literature and tests them against a specially constructed General Authorship Corpus. The resulting battery of reliable markers of authorship includes word and sentence length statistics and word-frequency measures. A series of worked examples with decreasing number of texts demonstrates the method and tests its limits, showing positive attributions where possible and no false attributions even when comparison data is limited. In addition to the development and application of the battery of valid, reliable markers of authorship, the role of stylistic idiosyncrasies in attribution is discussed and developed as a secondary strategy. Possibilities for the statistical presentation of results are considered and a Bayesian approach is proffered as the most desirable

APA, Harvard, Vancouver, ISO, and other styles

7

Pires, David Laranjo. "Authorship attribution using co-occurrence networks." Master's thesis, Universidade de Évora, 2021. http://hdl.handle.net/10174/30831.

Full text

Abstract:

Atribuição de Autoria utlizando Redes de Co-Ocorrencia Nesta tese é abordada a tarefa de Atribuição de Autoria como uma tarefa de classificação. As metodologias utilizadas representam textos em grafos. Destes, várias medidas são extraídas, sendo utilizadas como amostras para o classificador. Já existem alguns trabalhos que também se focam nesta metodologia. Esta tese foca-se num método que divide o texto em várias partes e trata cada uma como um grafo. Deste, são extraídas as medidas, que são tratadas como uma série temporal, da qual são extraídos momentos. Assim, os momentos compõem o vetor final, representativo de todo o texto. A partir da metodologia aqui descrita surgem mais duas variações. A primeira variação omite o passo das séries temporais, e, por consequência, as várias medidas de cada grafo são utilizadas diretamente como amostras. A segunda variação representa todo o texto como um só grafo. As metodologias são testadas com corpus em Inglês e Português, com número variado de textos; Abstract: Authorship Attribution using Co-Occurrence Networks This thesis approaches the task of Authorship Attribution as a classification task. This is done using methodologies that represent text documents in graphs, from which several measures are extracted, to be used as samples for the classifier. There have been some works that also focus on this methodology. This thesis focuses on a methodology which splits the texts in multiple parts and treats each as a separate graph, from which measures are extracted. Each graph’s measures are treated as a time-series and moments are extracted. These moments make the final vector, representative of the entire text. This methodology is explored and extended with 2 variations. The first variation skips the time-series step, resulting in the various measures from each graph being used directly as samples. The second variation models the entire text as one graph. The methodologies are tested in corpus in both English and Portuguese, with varying number of texts.

APA, Harvard, Vancouver, ISO, and other styles

8

Gopalakrishnan, Sridharan. "Authorship Attribution based on Grammar Signatures." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1368026620.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Caver, Johnnie F. "Novel topic impact on authorship attribution." Thesis, Monterey, California : Naval Postgraduate School, 2009. http://edocs.nps.edu/npspubs/scholarly/theses/2009/Dec/09Dec%5FCaver.pdf.

Full text

Abstract:

Thesis (M.S. in Computer Science)--Naval Postgraduate School, December 2009.
Thesis Advisor(s): Schein, Andrew I. ; Martell, Craig H. "December 2009." Description based on title screen as viewed on February 01, 2010. Author(s) subject terms: Authorship detection, topic detection, author-topic correlation, topic-author correlation, maximum entropy, New York Times Annotated Corpus. Includes bibliographical references (p. 61-63). Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhao, Ying, and ying zhao@rmit edu au. "Effective Authorship Attribution in Large Document Collections." RMIT University. Computer Science and Information Technology, 2008. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080730.162501.

Full text

Abstract:

Techniques that can effectively identify authors of texts are of great importance in scenarios such as detecting plagiarism, and identifying a source of information. A range of attribution approaches has been proposed in recent years, but none of these are particularly satisfactory; some of them are ad hoc and most have defects in terms of scalability, effectiveness, and computational cost. Good test collections are critical for evaluation of authorship attribution (AA) techniques. However, there are no standard benchmarks available in this area; it is almost always the case that researchers have their own test collections. Furthermore, collections that have been explored in AA are usually small, and thus whether the existing approaches are reliable or scalable is unclear. We develop several AA collections that are substantially larger than those in literature; machine learning methods are used to establish the value of using such corpora in AA. The results, also used as baseline results in this thesis, show that the developed text collections can be used as standard benchmarks, and are able to clearly distinguish between different approaches. One of the major contributions is that we propose use of the Kullback-Leibler divergence, a measure of how different two distributions are, to identify authors based on elements of writing style. The results show that our approach is at least as effective as, if not always better than, the best existing attribution methods-that is, support vector machines-for two-class AA, and is superior for multi-class AA. Moreover our proposed method has much lower computational cost and is cheaper to train. Style markers are the key elements of style analysis. We explore several approaches to tokenising documents to extract style markers, examining which marker type works the best. We also propose three systems that boost the AA performance by combining evidence from various marker types, motivated from the observation that there is no one type of marker that can satisfy all AA scenarios. To address the scalability of AA, we propose the novel task of authorship search (AS), inspired by document search and intended for large document collections. Our results show that AS is reasonably effective to find documents by a particular author, even within a collection consisting of half a million documents. Beyond search, we also propose the AS-based method to identify authorship. Our method is substantially more scalable than any method published in prior AA research, in terms of the collection size and the number of candidate authors; the discrimination is scaled up to several hundred authors.

APA, Harvard, Vancouver, ISO, and other styles

11

Johnson, Russell Clark. "Authorship Attribution with Function Word N-Grams." NSUWorks, 2013. http://nsuworks.nova.edu/gscis_etd/188.

Full text

Abstract:

Prior research has considered the sequential order of function words, after the contextual words of the text have been removed, as a stylistic indicator of authorship. This research describes an effort to enhance authorship attribution accuracy based on this same information source with alternate classifiers, alternate n-gram construction methods, and a genetically tuned configuration. The approach is original in that it is the first time that probabilistic versions of Burrows's Delta have been used. Instead of using z-scores as an input for a classifier, the z-scores were converted to probabilistic equivalents (since z-scores cannot be subtracted, added, or divided without the possibility of distorting their probabilistic meaning); this adaptation enhanced accuracy. Multiple versions of Burrows's Delta were evaluated; this includes a hybrid of the Probabilistic Burrows's Delta and the version proposed by Smith & Aldridge (2011); in this case accuracy was enhanced when individual frequent words were evaluated as indicators of style. Other novel aspects include alternate n-gram construction methods; a reconciliation process that allows texts of various lengths from different authors to be compared; and a GA selection process that determines which function (or frequent) words (see Smith & Rickards, 2008; see also Shaker, Corne, & Everson, 2007) may be used in the construction of function word n-grams.

APA, Harvard, Vancouver, ISO, and other styles

12

Teixeira, Filipe. "Boosting compression-based classifiers for authorship attribution." Master's thesis, Universidade de Aveiro, 2016. http://hdl.handle.net/10773/18375.

Full text

Abstract:

Mestrado em Engenharia de Computadores e Telemática
Atribuição de autoria é o ato de atribuir um autor a documento anónimo. Apesar de esta tarefa ser tradicionalmente feita por especialistas, muitos novos métodos foram apresentados desde o aparecimento de computadores, em meados do século XX, alguns deles recorrendo a compressores para encontrar padrões recorrentes nos dados. Neste trabalho vamos apresentar os resultados que podem ser alcançados ao utilizar mais do que um compressor, utilizando um meta-algoritmo conhecido como Boosting.
Authorship attribution is the task of assigning an author to an anonymous document. Although the task was traditionally performed by expert linguists, many new techniques have been suggested since the appearance of computers, in the middle of the XX century, some of them using compressors to find repeating patterns in the data. This work will present the results that can be achieved by a collaboration of more than one compressor using a meta-algorithm known as Boosting.

APA, Harvard, Vancouver, ISO, and other styles

13

Boutwell, Sarah R. "Authorship attribution of short messages using multimodal features." Thesis, Monterey, California. Naval Postgraduate School, 2011. http://hdl.handle.net/10945/5813.

Full text

Abstract:

Approved for public release; distribution is unlimited
In this thesis, we develop a multimodal classifier for authorship attribution of short messages. Standard natural language processing authorship attribution techniques are applied to a Twitter text corpus. Using character n-gram features and a NaiÌ ve Bayes classifier, we build statistical models of the set of authors. The social network of the selected Twitter users is analyzed using the screen names referenced in their messages. The timestamps of the messages are used to generate a pattern-of-life model. We analyze the physical layer of a network by measuring modulation characteristics of GSM cell phones. A statistical model of each cell phone is created using a NaiÌ ve Bayes classifier. Each phone is assigned to a Twitter user, and the probability outputs of the individual classifiers are combined to show that the combination of natural-language and network-feature classifiers identifies a user to phone binding better than when the individual classifiers are used independently.

APA, Harvard, Vancouver, ISO, and other styles

14

Sari, Yunita. "Neural and non-neural approaches to authorship attribution." Thesis, University of Sheffield, 2018. http://etheses.whiterose.ac.uk/21415/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Sapkota, Upendra. "Improving the performance of cross-domain authorship attribution." Thesis, The University of Alabama at Birmingham, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3739881.

Full text

Abstract:

Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from the same distribution. But in real scenarios, this assumption is too strong. Because of domain mismatches, the AA approaches that perform well on same domain scenarios will degrade performance in cross-domain settings. The goal of this research is to improve the prediction results in cross-domain AA (CDAA), where there is no training data available from the target domain. We propose three different CDAA frameworks to overcome the lack of training samples from the target domain. Our first framework is driven by the hypothesis that a simple model built from all available out-of-domain data effectively discriminates among authors for a new domain. In addition to improving the performance of CDAA, we also study the effectiveness of the three most commonly used feature types in AA. In the second framework, we explore character n-grams by separating them into ten distinct categories based on the linguistic aspect they represent. Finally, the third framework tries to represent each instance with a common feature representation that is meaningful across domains. Based on the findings of our first and second framework, we propose to use and compare two formulations of features for CDAA.

We use prediction accuracy as the performance metric. We compare the performance of proposed frameworks with state-of-the-art approaches, whenever possible. We first demonstrate that addition of training data even if it comes from out-of-topic improves the performance of cross-topic AA. Also we find that character n-grams are the most effective author discriminator for both single as well as cross-domain AA. Once we demonstrate the efficacy of character n-grams in CDAA, we then propose to categorize them to further understand their predictive value. We then demonstrate the discriminative power of each n-gram category, and propose to discard some of the worst performing categories. In the third framework, we demonstrate that structural correspondence learning can induce feature correspondences for AA, and these feature correspondences combine with our character n-gram categorization to yield superior performance on cross-domain AA.

APA, Harvard, Vancouver, ISO, and other styles

16

Shaker, Kareem. "Investigating features and techniques for Arabic authorship attribution." Thesis, Heriot-Watt University, 2012. http://hdl.handle.net/10399/2576.

Full text

Abstract:

Authorship attribution is the problem of identifying the true author of a disputed text. Throughout history, there have been many examples of this problem concerned with revealing genuine authors of works of literature that were published anonymously, and in some cases where more than one author claimed authorship of the disputed text. There has been considerable research effort into trying to solve this problem. Initially these efforts were based on statistical patterns, and more recently they have centred on a range of techniques from artificial intelligence. An important early breakthrough was achieved by Mosteller and Wallace in 1964 [15], who pioneered the use of ‘function words’ – typically pronouns, conjunctions and prepositions – as the features on which to base the discovery of patterns of usage relevant to specific authors. The authorship attribution problem has been tackled in many languages, but predominantly in the English language. In this thesis the problem is addressed for the first time in the Arabic Language. We therefore investigate whether the concept of functions words in English can also be used in the same way for authorship attribution in Arabic. We also describe and evaluate a hybrid of evolutionary algorithms and linear discriminant analysis as an approach to learn a model that classifies the author of a text, based on features derived from Arabic function words. The main target of the hybrid algorithm is to find a subset of features that can robustly and accurately classify disputed texts in unseen data. The hybrid algorithm also aims to do this with relatively small subsets of features. A specialised dataset was produced for this work, based on a collection of 14 Arabic books of different natures, representing a collection of six authors. This dataset was processed into training and test partitions in a way that provides a diverse collection of challenges for any authorship attribution approach. The combination of the successful list of Arabic function words and the hybrid algorithm for classification led to satisfying levels of accuracy in determining the author of portions of the texts in test data. The work described here is the first (to our knowledge) that investigates authorship attribution in the Arabic knowledge using computational methods. Among its contributions are: the first set of Arabic function words, the first specialised dataset aimed at testing Arabic authorship attribution methods, a new hybrid algorithm for classifying authors based on patterns derived from these function words, and, finally, a number of ideas and variants regarding how to use function words in association with character level features, leading in some cases to more accurate results.

APA, Harvard, Vancouver, ISO, and other styles

17

Balla, Stefano. "On code stylometry: authorship attribution of source codesnippets." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24264/.

Full text

Abstract:

The subject of stylometry has long been addressed in the world of natural language. In recent decades, this concept has also begun to be considered in source code, trying to identify programming style. In this research, an innovative method for code representation is proposed. Thanks to this method it is then demonstrated how it is possible through a neural model called code2vec, to make author recognition of small pieces of source code. Finally, it is also shown how some tools widely used in the field of software engineering, autoformatters in this case, influence the stylistic contribution.

APA, Harvard, Vancouver, ISO, and other styles

18

Lindh, Morén Jonas. "The Application of Closed Frequent Subtrees to Authorship Attribution." Thesis, Umeå universitet, Institutionen för datavetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-86458.

Full text

Abstract:

In this experimental study we compare the authorship attribution performance of two different types of distinguishing features; overlapping syntax subtrees of height one (or small trees) and closed frequent syntax subtrees. Authors and documents used in the experiments are randomly drawn from a large corpus of blog posts and news articles. Results show that small trees outperform closed frequent trees on this data set, both in terms of classifier performance and computational eciency.

APA, Harvard, Vancouver, ISO, and other styles

19

Grieve, Jack William. "Quantitative authorship attribution : a history and an evaluation of techniques /." Burnaby B.C. : Simon Fraser University, 2005. http://ir.lib.sfu.ca/handle/1892/2055.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Cavalcante, Thiago 1989. "Authorship attribution on micro-messages = Atribuição de autoria em micro-mensagens." [s.n.], 2014. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275539.

Full text

Abstract:

Orientadores: Ariadne Maria Brito Rizzoni Carvalho, Anderson de Rezende Rocha
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica
Made available in DSpace on 2018-08-26T21:23:31Z (GMT). No. of bitstreams: 1 Cavalcante_Thiago_M.pdf: 3493838 bytes, checksum: 369bd6608e7326d0a998b426a1c7455b (MD5) Previous issue date: 2014
Resumo: Com o crescimento continuo do uso de midias sociais, a atribuição de autoria tem um papel imortante na prevenção dos crimes cibernéticos e na análise de rastros online deixados por assediadores, \textit{bullies}, ladrões de identidade entre outros. Nesta dissertação, nós propusemos um método para atribuição de autoria que é de cem a mil vezes mais rápido que o estado da arte. Nós também obtivemos uma acurácia 65\% na classificação de 50 autores. O método proposto se baseia numa representação de caracteristicas escalável utilizando os padrões das mensagens dos micro-blogs, e também nos utilizamos de um classificador de padrões customizado para lidar com grandes quantidades de dados e alta dimensionalidade. Por fim, nós discutimos a redução do espaço de busca na análise de centenas de suspeitos online e milões de micro mensagens online, o que torna essa abordagem valiosa para forense digital e aplicação das leis
Abstract: With the ever-growing use of social media, authorship attribution plays an important role in avoiding cybercrime, and helping the analysis of online trails left behind by cyber pranks, stalkers, bullies, identity thieves and alike. In this dissertation, we propose a method for authorship attribution in micro blogs with efficiency one hundred to a thousand times faster than state-of-the-art counterparts. We also achieved a accuracy of 65% when classifying texts from 50 authors. The method relies on a powerful and scalable feature representation approach taking advantage of user patterns on micro-blog messages, and also on a custom-tailored pattern classifier adapted to deal with big data and high-dimensional data. Finally, we discuss search space reduction when analysing hundreds of online suspects and millions of online micro messages, which makes this approach invaluable for digital forensics and law enforcement
Mestrado
Ciência da Computação
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

21

Hendrikse, Steven. "The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files." NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1009.

Full text

Abstract:

In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated.

APA, Harvard, Vancouver, ISO, and other styles

22

Corney, Malcolm W. "Analysing e-mail text authorship for forensic purposes." Thesis, Queensland University of Technology, 2003. https://eprints.qut.edu.au/16069/1/Malcolm_Corney_Thesis.pdf.

Full text

Abstract:

E-mail has become the most popular Internet application and with its rise in use has come an inevitable increase in the use of e-mail for criminal purposes. It is possible for an e-mail message to be sent anonymously or through spoofed servers. Computer forensics analysts need a tool that can be used to identify the author of such e-mail messages. This thesis describes the development of such a tool using techniques from the fields of stylometry and machine learning. An author's style can be reduced to a pattern by making measurements of various stylometric features from the text. E-mail messages also contain macro-structural features that can be measured. These features together can be used with the Support Vector Machine learning algorithm to classify or attribute authorship of e-mail messages to an author providing a suitable sample of messages is available for comparison. In an investigation, the set of authors may need to be reduced from an initial large list of possible suspects. This research has trialled authorship characterisation based on sociolinguistic cohorts, such as gender and language background, as a technique for profiling the anonymous message so that the suspect list can be reduced.

APA, Harvard, Vancouver, ISO, and other styles

23

Corney, Malcolm W. "Analysing E-mail Text Authorship for Forensic Purposes." Queensland University of Technology, 2003. http://eprints.qut.edu.au/16069/.

Full text

Abstract:

E-mail has become the most popular Internet application and with its rise in use has come an inevitable increase in the use of e-mail for criminal purposes. It is possible for an e-mail message to be sent anonymously or through spoofed servers. Computer forensics analysts need a tool that can be used to identify the author of such e-mail messages. This thesis describes the development of such a tool using techniques from the fields of stylometry and machine learning. An author's style can be reduced to a pattern by making measurements of various stylometric features from the text. E-mail messages also contain macro-structural features that can be measured. These features together can be used with the Support Vector Machine learning algorithm to classify or attribute authorship of e-mail messages to an author providing a suitable sample of messages is available for comparison. In an investigation, the set of authors may need to be reduced from an initial large list of possible suspects. This research has trialled authorship characterisation based on sociolinguistic cohorts, such as gender and language background, as a technique for profiling the anonymous message so that the suspect list can be reduced.

APA, Harvard, Vancouver, ISO, and other styles

24

Lindholm, Lars. ""Art Made Tongue-tied By Authority?" : The Shakespeare Authorship Question." Thesis, Stockholms universitet, Engelska institutionen, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-78261.

Full text

Abstract:

The essay presents the scholarly controversy over the correct attribution of the works by “Shakespeare”. The main alternative author is Edward de Vere, 17th earl of Oxford. 16th century conventions allowed noblemen to write poetry or drama only for private circulation. To appear in print, such works had to be anonymous or under pseudonym. Overtly writing for public theatre, a profitable business, would have been a degrading conduct. Oxford’s contemporary fame as an author is little matched by known works. Great gaps in relevant sources indicate that documents concerning not only his person and authorship but also the life of Shakspere from Stratford, the alleged author, have been deliberately eliminated in order to transfer the authorship, for which the political authority of the Elizabethan and Jacobean autocratic society had motive and resources enough. A restored identity would imply radical redating of plays and poems. To what extent literature is autobiographical, or was in that age, and whether restoring a lost identity from written works is legitimate at all, are basic issues of the debate, always implying tradition without real proof versus circumstantial evidence. As such arguments are incompatible, both sides have incessantly missed their targets. The historical conditions for the sequence of events that created the fiction, and its main steps, are related. Oxford will be in focus, since most old and new evidence for making a case has reference to him. The views of the two parties on different points are presented by continual quoting from representative recent works by Shakespeare scholars, where the often scornful tone of the debate still echoes. It is claimed that the urge for concrete results will make the opinion veer to the side that proves productive and eventually can create a new coherent picture, but better communication between the parties’ scholars is called for.
Literary Degree Project

APA, Harvard, Vancouver, ISO, and other styles

25

Kimler, Marco. "Using Style Markers for Detecting Plagiarism in Natural Language Documents." Thesis, University of Skövde, Department of Computer Science, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-824.

Full text

Abstract:

Most of the existing plagiarism detection systems compare a text to a database of other texts. These external approaches, however, are vulnerable because texts not contained in the database cannot be detected as source texts. This paper examines an internal plagiarism detection method that uses style markers from authorship attribution studies in order to find stylistic changes in a text. These changes might pinpoint plagiarized passages. Additionally, a new style marker called specific words is introduced. A pre-study tests if the style markers can fingerprint an author s style and if they are constant with sample size. It is shown that vocabulary richness measures do not fulfil these prerequisites. The other style markers - simple ratio measures, readability scores, frequency lists, and entropy measures - have these characteristics and are, together with the new specific words measure, used in a main study with an unsupervised approach for detecting stylistic changes in plagiarized texts at sentence and paragraph levels. It is shown that at these small levels the style markers generally cannot detect plagiarized sections because of intra-authorial stylistic variations (i.e. noise), and that at bigger levels the results are strongly a ected by the sliding window approach. The specific words measure, however, can pinpoint single sentences written by another author.

APA, Harvard, Vancouver, ISO, and other styles

26

Bugo, Laura. "authorship analysis: studio delle metodologie e sviluppo di un sistema di riconoscimento." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text

Abstract:

Lo scopo del lavoro è quello di implementare un programma per il riconoscimento degli autori che permetta di individuare, tra un gruppo di sospetti, l'autore di un testo ignoto, avendo in input alcuni testi per ogni sospetto. Dai testi degli autori sono state estratte delle caratteristiche stilistiche costruite basandosi su esperimenti presenti in letteratura e attraverso l'utilizzo di nuove tecnologie non ancora testate nel problema dell'authorship attribution, Le caratteristiche stilistiche costruite sono quindi utilizzate per riconoscere gli autori dei testi di cui non è nota la paternità. Le nuove tecnologie utilizzate sono principalmente due: l'algoritmo word2vec, che permette di ottenere un'idea della distanza semantica che separa tra loro le parole, e il classificatore XGBoost, che, rispetto ai classificatori utilizzati in altri esperimenti, è più flessibile ed efficace. I risultati ottenuti con l'ausilio di queste nuove tecnologie sono molto elevati e presentano un buon miglioramento rispetto allo stato dell'arte.

APA, Harvard, Vancouver, ISO, and other styles

27

Marinho, Vanessa Queiroz. "Development of new models for authorship recognition using complex networks." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-14112017-091805/.

Full text

Abstract:

Complex networks have been successfully applied to different fields, being the subject of study in different areas that include, for example, physics and computer science. The finding that methods of complex networks can be used to analyze texts in their different complexity levels has implied in advances in natural language processing (NLP) tasks. Examples of applications analyzed with the methods of complex networks are keyword identification, development of automatic summarizers, and authorship attribution systems. The latter task has been studied with some success through the representation of co-occurrence (or adjacency) networks that connect only the closest words in the text. Despite this success, only a few works have attempted to extend this representation or employ different ones. Moreover, many approaches use a similar set of measurements to characterize the networks and do not combine their techniques with the ones traditionally used for the authorship attribution task. This Masters research proposes some extensions to the traditional co-occurrence model and investigates new attributes and other representations (such as mesoscopic and named entity networks) for the task. The connectivity information of function words is used to complement the characterization of authors writing styles, as these words are relevant for the task. Finally, the main contribution of this research is the development of hybrid classifiers, called labelled motifs, that combine traditional factors with properties obtained with the topological analysis of complex networks. The relevance of these classifiers is verified in the context of authorship attribution and translationese identification. With this hybrid approach, we show that it is possible to improve the performance of networkbased techniques when they are combined with traditional ones usually employed in NLP. By adapting, combining and improving the model, not only the performance of authorship attribution systems was improved, but also it was possible to better understand what are the textual quantitative factors (measured through networks) that can be used in stylometry studies. The advances obtained during this project may be useful to study related applications, such as the analysis of stylistic inconsistencies and plagiarism, and the analysis of text complexity. Furthermore, most of the methods proposed in this work can be easily applied to many natural languages.
Redes complexas vem sendo aplicadas com sucesso em diferentes domínios, sendo o tema de estudo de distintas áreas que incluem, por exemplo, a física e a computação. A descoberta de que métodos de redes complexas podem ser utilizados para analisar textos em seus distintos níveis de complexidade proporcionou avanços em tarefas de processamento de línguas naturais (PLN). Exemplos de aplicações analisadas com os métodos de redes complexas são a detecção de palavras-chave, a criação de sumarizadores automáticos e o reconhecimento de autoria. Esta última tarefa tem sido estudada com certo sucesso através da representação de redes de co-ocorrência (ou adjacência) de palavras que conectam apenas as palavras mais próximas no texto. Apesar deste sucesso, poucos trabalhos tentaram estender essas redes ou utilizar diferentes representações. Além disso, muitas das abordagens utilizam um conjunto semelhante de medidas de redes complexas e não combinam suas técnicas com as utilizadas tradicionalmente na tarefa de reconhecimento de autoria. Esta pesquisa de mestrado propõe extensões à modelagem tradicional de co-ocorrência e investiga a adequabilidade de novos atributos e de outras modelagens (como as redes mesoscópicas e de entidades nomeadas) para a tarefa. A informação de conectividade de palavras funcionais é utilizada para complementar a caracterização da escrita dos autores, uma vez que essas palavras são relevantes para a tarefa. Finalmente, a maior contribuição deste trabalho consiste no desenvolvimento de classificadores híbridos, denominados labelled motifs, que combinam fatores tradicionais com as propriedades fornecidas pela análise topológica de redes complexas. A relevância desses classificadores é verificada no contexto de reconhecimento de autoria e identificação de translationese. Com esta abordagem híbrida, mostra-se que é possível melhorar o desempenho de técnicas baseadas em rede ao combiná-las com técnicas tradicionais em PLN. Através da adaptação, combinação e aperfeiçoamento da modelagem, não apenas o desempenho dos sistemas de reconhecimento de autoria foi melhorado, mas também foi possível entender melhor quais são os fatores quantitativos textuais (medidos via redes) que podem ser utilizados na área de estilometria. Os avanços obtidos durante este projeto podem ser utilizados para estudar aplicações relacionadas, como é o caso da análise de inconsistências estilísticas e plagiarismos, e análise da complexidade textual. Além disso, muitos dos métodos propostos neste trabalho podem ser facilmente aplicados em diversas línguas naturais.

APA, Harvard, Vancouver, ISO, and other styles

28

Schneider, Michael J. "A Study on the Efficacy of Sentiment Analysis in Author Attribution." Digital Commons @ East Tennessee State University, 2015. https://dc.etsu.edu/etd/2538.

Full text

Abstract:

The field of authorship attribution seeks to characterize an author’s writing style well enough to determine whether he or she has written a text of interest. One subfield of authorship attribution, stylometry, seeks to find the necessary literary attributes to quantify an author’s writing style. The research presented here sought to determine the efficacy of sentiment analysis as a new stylometric feature, by comparing its performance in attributing authorship against the performance of traditional stylometric features. Experimentation, with a corpus of sci-fi texts, found sentiment analysis to have a much lower performance in assigning authorship than the traditional stylometric features.

APA, Harvard, Vancouver, ISO, and other styles

29

Taromi, Kurosh [Verfasser]. "Authorship Attribution in Modern Persian Prose : An Innovative Method to Find Style Discriminators Between Any Set of Authors / Kurosh Taromi." Saarbrücken : VDM Verlag Dr. Müller, 2010. http://www.vdm-verlag.de.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Levy-Minzie, Kori. "Authorship attribution in the e-mail domain a study of the effect of size of author corpus and topic on accuracy of identification." Thesis, Monterey, California. Naval Postgraduate School, 2011. http://hdl.handle.net/10945/5780.

Full text

Abstract:

Approved for public release; distribution is unlimited.
We determined that it is possible to achieve authorship attribution in the e-mail domain when training on "ersonal" e-mails and testing on "work" e-mails and vice versa. These results are unique since they simulate two different e-mail addresses belonging to the same person where the topic of the e-mails from the two different addresses do not intersect. As we only used one classification technique, these results are preliminary and may serve as a baseline for future work in this area. The corpus of data was the entirety of the Enron corpus as well as a subsection of hand-annotated work and personal e-mails. We discovered that there is enough author signal in each class to identify an author in a sea of noise. We included suggestions for future work in the areas of expanding feature selection, increasing corpus size, and including more classification methods. Advancement in this area will contribute to increasing cyber security by identifying the senders of anonymous derogatory e-mails and reducing cyber bullying.

APA, Harvard, Vancouver, ISO, and other styles

31

Funai, Tomohiko. "Extensions of Nearest Shrunken Centroid Method for Classification." BYU ScholarsArchive, 2010. https://scholarsarchive.byu.edu/etd/2402.

Full text

Abstract:

Stylometry assumes that the essence of the individual style of an author can be captured using a number of quantitative criteria, such as the relative frequencies of noncontextual words (e.g., or, the, and, etc.). Several statistical methodologies have been developed for authorship analysis. Jockers et al. (2009) utilize Nearest Shrunken Centroid (NSC) classification, a promising classification methodology in DNA microarray analysis for authorship analysis of the Book of Mormon. Schaalje et al. (2010) develop an extended NSC classification to remedy the problem of a missing author. Dabney (2005) and Koppel et al. (2009) suggest other modifications of NSC. This paper develops a full Bayesian classifier and compares its performance to five versions of the NSC classifier using the Federalist Papers, the Book of Mormon text blocks, and the texts of seven other authors. The full Bayesian classifier was superior to all other methods.

APA, Harvard, Vancouver, ISO, and other styles

32

Dubois, François-Ronan. "L'Appropriation de l'œuvre : Instances et visées de l'attribution des œuvres à leur auteur dans la France de l'Ancien Régime (1645-1777)." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAL038/document.

Full text

Abstract:

Le système de la propriété littéraire dans la France de l’Ancien Régime est souvent examiné de manière rétrospective à l’aune du droit d’auteur contemporain : tout se passe comme si la librairie d’Ancien Régime devait être nécessairement le laboratoire d’un dispositif juridique et idéologique en pleine formation, encore mal adapté aux réalités littéraires. Cette thèse propose d’examiner la question de cette propriété à nouveaux frais, en considérant le système de librairie comme un ensemble d’acteurs, de logiques et d’outils opératoire, sur le temps long, des années 1650 jusqu’aux années 1780. À travers l’étude des logiques institutionnelles de la propriété économique et de la responsabilité juridique, des dispositifs bibliographiques des dictionnaires, des journaux et des recueils d’ana et des opérations éditoriales imaginées par les auteurs eux-mêmes, elle met en évidence les rapports de force qui agitent la librairie et le monde littéraire de l’époque. En empruntant à l’histoire littéraire, à l’histoire du droit et à l’histoire du livre, ce travail entreprend de montrer de quelle manière la propriété littéraire se construit à l’encontre des intérêts des auteurs et en faveur de la constitution d’un monde de la librairie où l’État joue de moins en moins son rôle de régulateur des pratiques. À travers le prisme de l’attribution littéraire, la démonstration est menée avec un intérêt particulier pour l’analyse précise des paratextes littéraires
Literary property rights in early modern France are often understood through the prism of the contemporary droit d’auteur. Many studies see the early modern period as a laboratory for an on-going experiment in law and ideology, still ill-fitted to the literary practices of the authors. This thesis offers a fresh start in the examination of the question of literary property, taking the whole library system from the 1650s to the 1780s to be an effective articulation of agents, tools, and discursives practices. Through the study of institutional policies in the domain of literary property as well as judicial responsibility, through a careful reading of the bibliographical discourse with dictionaries, anas, and periodicals, and through the description of editorial endeavors undertaken by authors themselves, it shows the dynamics of the early modern library and literary world. With roots in literary history, history of law, and book history, this dissertation seeks to understand how the concept of literary property is aggregated, against the very interests of the authors, to consolidate a commercial book-trade where the State slowly delegates its regulatory powers. Through the study of literary attribution, this work follows its demonstrations with an acute interest in a close-reading of literary paratexts

APA, Harvard, Vancouver, ISO, and other styles

33

Java, James. "Characterization of Prose by Rhetorical Structure for Machine Learning Classification." NSUWorks, 2015. http://nsuworks.nova.edu/gscis_etd/347.

Full text

Abstract:

Measures of classical rhetorical structure in text can improve accuracy in certain types of stylistic classification tasks such as authorship attribution. This research augments the relatively scarce work in the automated identification of rhetorical figures and uses the resulting statistics to characterize an author's rhetorical style. These characterizations of style can then become part of the feature set of various classification models. Our Rhetorica software identifies 14 classical rhetorical figures in free English text, with generally good precision and recall, and provides summary measures to use in descriptive or classification tasks. Classification models trained on Rhetorica's rhetorical measures paired with lexical features typically performed better at authorship attribution than either set of features used individually. The rhetorical measures also provide new stylistic quantities for describing texts, authors, genres, etc.

APA, Harvard, Vancouver, ISO, and other styles

34

Valencia, Camilo Akimushkin. "Propriedades de redes aplicadas à atribuição de autoria." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/76/76131/tde-12092017-081937/.

Full text

Abstract:

O reconhecimento de autoria é uma área de pesquisa efervescente, com muitas aplicações, incluindo detecção de plágio, análise de textos históricos, reconhecimento de mensagens terroristas ou falsificação de documentos. Modelos teóricos de redes complexas já são usados para o reconhecimento de autoria, mas alguns aspectos importantes têm sido ignorados. Neste trabalho, exploramos a dinâmica de redes de co-ocorrência e a relação com as palavras que representam os nós e descobrimos que ambas são claras assinaturas de autoria. Com otimização dos descritores da topologia das redes e de algoritmos de aprendizado de máquina, foi possível obter taxas de acerto maiores que 85%, sendo atingida uma taxa de 98.75% em um caso específico, para coleções de 80 livros, cada uma compilada de 8 autores de língua inglesa com 10 livros por autor. Esta tese demonstra que existem ainda aspectos inexplorados das redes de co-ocorrência de textos, o que deve permitir avanços ainda maiores no futuro próximo.
Authorship attribution is an active research area with many applications, including detection of plagiarism, analysis of historical texts, terrorist message identification or document falsification. Theoretical models of complex networks are already used for authorship attribution, but some issues have been ignored. In this thesis, we explore the dynamics of co-occurrence networks and the role of words, and found that they are both clear signatures of authorship. Using optimized descriptors for the network topology and machine learning algorithms, it has been possible to achieve accuracy rates above 85%, with a rate of 98.75% being reached in a particular case, for collections of 80 books produced by 8 English-speaking writers with 10 books per author. It is also shown that there are still many unexplored aspects of co-occurrence networks of texts, which seems promising for near future developments.

APA, Harvard, Vancouver, ISO, and other styles

35

Belvisi, Nicole Mariah Sharon. "Document Forensics Through Textual Analysis." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-40157.

Full text

Abstract:

This project aims at giving a brief overview of the area of research called Authorship Analysis with main focus on Authorship Attribution and the existing methods. The second objective of this project is to test whether one of the main approaches in the field can be still be applied successfully to today's new ways of communicating. The study uses multiple stylometric features to establish the authorship of a text as well as a model based on the TF-IDF model.

APA, Harvard, Vancouver, ISO, and other styles

36

Žalkauskaitė, Gintarė. "Idiolekto požymiai elektroniniuose laiškuose." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120118_131320-23362.

Full text

Abstract:

Šiuo darbu siekta nustatyti, ar asmeninių elektroninių laiškų kalboje atsiskleidžia autoriaus idiolektas ir kokiais leksiniais bei grafiniais požymiais jis pasireiškia.. Tyrimui buvo surinktas šešių autorių asmeninių neoficialaus bendravimo elektroninių laiškų tekstynas. Tekstyno duomenys apdoroti pasitelkiant WordSmith Tools programą ir atlikta gretinamoji tekstų analizė: lyginti kalbos vienetų pasikartojimo dažniai tiriamųjų autorių laiškuose ir nustatyta, kad vienų autorių dažniau ar rečiau nei kitų vartojami kalbos vienetai skiria autorių idiolektus. Iš nustatytų kalbos požymių apibendrintos su idiolektu sietinų kalbinės raiškos vienetų grupės. Nustatyta, kad leksikos lygmenyje idiolektus aiškiausiai skiria autoriaus vertinimą ir nuostatas perteikiantys bei modalumą reiškiantys žodžiai bei iš galimų leksinių konkurentų pasirenkami žodžiai ir trumpiniai. Taip pat idiolektus žymi skirtingų autorių nevienodai dažnai pasirenkamų skyrybos ir grafinių ženklų vartojimas. Remiantis atlikto tyrimo rezultatais disertacijoje pateikiamos rekomendacijos teismo lingvistinius autorystės tyrimus atliekantiems ekspertams.
The current study aims to establish, if authors idiolect can be recognized in electronic mails language and to determine the features of lexis and graphics, which can be linked to idiolect. The data has been derived from a corpus of 65,000 words consisting of electronic letters written in Lithuanian by six persons. The WordSmith Tools software was used to generate frequency lists of six subcorpora, representing each person’s language. By using the contrastive method the frequency data of six persons language were compared. The lexis and graphics elements, which were used by one person more often or more rarely than by others and were not determined by the topic, were linked to authors idiolect. As a result of the analysis the classification of lexical and graphical elements is given, which can help recognizing idiolect. The study shows that on a lexical level the main differences between idiolects are in the usage of the modality and stance expressing words, and also the words and abbreviations, which are differently chosen from possible variants. On a graphical level idiolects can be recognized from punctuation marks, emoticons and graphic symbols, used at a different frequency. Based on research results the recommendations for authorship attribution examinations are given.

APA, Harvard, Vancouver, ISO, and other styles

37

Žalkauskaitė, Gintarė. "Features of Idiolect in E-mails." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120118_131329-68478.

Full text

Abstract:

The current study aims to establish, if authors idiolect can be recognized in electronic mails language and to determine the features of lexis and graphics, which can be linked to idiolect. The data has been derived from a corpus of 65,000 words consisting of electronic letters written in Lithuanian by six persons. The WordSmith Tools software was used to generate frequency lists of six subcorpora, representing each person’s language. By using the contrastive method the frequency data of six persons language were compared. The lexis and graphics elements, which were used by one person more often or more rarely than by others and were not determined by the topic, were linked to authors idiolect. As a result of the analysis the classification of lexical and graphical elements is given, which can help recognizing idiolect. The study shows that on a lexical level the main differences between idiolects are in the usage of the modality and stance expressing words, and also the words and abbreviations, which are differently chosen from possible variants. On a graphical level idiolects can be recognized from punctuation marks, emoticons and graphic symbols, used at a different frequency. Based on research results the recommendations for authorship attribution examinations are given.
Šiuo darbu siekta nustatyti, ar asmeninių elektroninių laiškų kalboje atsiskleidžia autoriaus idiolektas ir kokiais leksiniais bei grafiniais požymiais jis pasireiškia.. Tyrimui buvo surinktas šešių autorių asmeninių neoficialaus bendravimo elektroninių laiškų tekstynas. Tekstyno duomenys apdoroti pasitelkiant WordSmith Tools programą ir atlikta gretinamoji tekstų analizė: lyginti kalbos vienetų pasikartojimo dažniai tiriamųjų autorių laiškuose ir nustatyta, kad vienų autorių dažniau ar rečiau nei kitų vartojami kalbos vienetai skiria autorių idiolektus. Iš nustatytų kalbos požymių apibendrintos su idiolektu sietinų kalbinės raiškos vienetų grupės. Nustatyta, kad leksikos lygmenyje idiolektus aiškiausiai skiria autoriaus vertinimą ir nuostatas perteikiantys bei modalumą reiškiantys žodžiai bei iš galimų leksinių konkurentų pasirenkami žodžiai ir trumpiniai. Taip pat idiolektus žymi skirtingų autorių nevienodai dažnai pasirenkamų skyrybos ir grafinių ženklų vartojimas. Remiantis atlikto tyrimo rezultatais disertacijoje pateikiamos rekomendacijos teismo lingvistinius autorystės tyrimus atliekantiems ekspertams.

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Beichen. "Stylometric Embeddings for Book Similarities." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-303125.

Full text

Abstract:

Stylometry is the field of research aimed at defining features for quantifying writing style, and the most studied question in stylometry has been authorship attribution, where given a set of texts with known authorship, we are asked to determine the author of a new unseen document. In this study a number of lexical and syntactic stylometric feature sets were extracted for two datasets, a smaller one containing 27 books from 25 authors, and a larger one containing 11,063 books from 316 authors. Neural networks were used to transform the features into embeddings after which the nearest neighbor method was used to attribute texts to their closest neighbor. The smaller dataset achieved an accuracy of 91.25% using frequencies of 50 most common functional words, dependency relations, and Part-of-speech (POS) tags as features, and the larger dataset achieved 69.18% accuracy using a similar feature set with 100 most common functional words. In addition to performing author attribution, a user test showed the potentials of the model in generating author similarities and hence being useful in an applied setting for recommending books to readers based on author style.
Stilometri eller stilistisk statistik är ett forskningsområde som arbetar med att definiera särdrag för att kvantitativt studera stilistisk variation hos författare. Stilometri har mest fokuserat på författarbestämning, där uppgiften är att avgöra vem som skrivit en viss text där författaren är okänd, givet tidigare texter med kända författare. I denna stude valdes ett antal lexikala och syntaktiska stilistiska särdrag vilka användes för att bestämma författare. Experimentella resultat redovisas för två samlingar litterära verk: en mindre med 27 böcker skrivna av 25 författare och en större med 11 063 böcker skrivna av 316 författare. Neurala nätverk användes för att koda de valda särdragen som vektorer varefter de närmaste grannarna för de okända texterna i vektorrummet användes för att bestämma författarna. För den mindre samlingen uppnåddes en träffsäkerhet på 91,25% genom att använda de 50 vanligaste funktionsorden, syntaktiska dependensrelationer och ordklassinformation. För den större samlingen uppnåddes en träffsäkerhet på 69,18% med liknande särdrag. Ett användartest visar att modellen utöver att bestämma författare har potential att representera likhet mellan författares stil. Detta skulle kunna tillämpas för att rekommendera böcker till läsare baserat på stil.

APA, Harvard, Vancouver, ISO, and other styles

39

Koh, Kok Chuan. "Modeling Alcohol Consumption Using Blog Data." Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc271843/.

Full text

Abstract:

How do the content and writing style of people who drink alcohol beverages stand out from non-drinkers? How much information can we learn about a person's alcohol consumption behavior by reading text that they have authored? This thesis attempts to extend the methods deployed in authorship attribution and authorship profiling research into the domain of automatically identifying the human action of drinking alcohol beverages. I examine how a psycholinguistics dictionary (the Linguistics Inquiry and Word Count lexicon, developed by James Pennebaker), together with Kenneth Burke's concept of words as symbols of human action, and James Wertsch's concept of mediated action provide a framework for analyzing meaningful data patterns from the content of blogs written by consumers of alcohol beverages. The contributions of this thesis to the research field are twofold. First, I show that it is possible to automatically identify blog posts that have content related to the consumption of alcohol beverages. And second, I provide a framework and tools to model human behavior through text analysis of blog data.

APA, Harvard, Vancouver, ISO, and other styles

40

CONSALVI, ANDREA. "ATTRIBUZIONE DI TESTI LETTERARI: UNA PROPOSTA METODOLOGICA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2022. http://hdl.handle.net/10280/122312.

Full text

Abstract:

La tesi presenta gli aspetti teorici e pratici per operare in modo scientifico nel campo degli studi di attribuzione di opere letterarie. Il primo capitolo offre un glossario nel quale viene indicata la corretta terminologia da impiegare in casi di intertestualità più o meno dichiarata ed elaborata. Il secondo, per mezzo di un excursus dall’età ellenistica fino ai giorni nostri, ha l’obiettivo di ripercorrere le tappe principali della storia degli studi di attribuzione per acquisire maggiore consapevolezza su come il campo sia cambiato, specialmente in termini di discipline coinvolte e possibilità di analisi. Il terzo offre una proposta metodologica con relativi strumenti informatici per condurre studi di questa natura, dalla lettura del testo fino all’interpretazione dei dati e alle conclusioni. Tale ricerca ha consentito di tracciare e meglio definire il campo degli studi di attribuzione, sia da un punto di vista storico sia pratico-metodologico. Inoltre, si è dimostrato fondamentale l’apporto dell’informatica umanistica e di conseguenza delle sue interazioni con le altre discipline.
The dissertation provides some theoretical and practical resources for scientific investigations into the field of authorship attribution studies of literary works. The first chapter provides a glossary of the correct terminology to be used when addressing various degrees of declared and elaborated intertextuality. Chapter Two follows with an excursus from the Hellenistic age to the present day aimed at retracing the main stages of the history of authorship attribution studies to gain greater awareness of how the field has changed, especially in terms of the disciplines involved and potential analyses. The third chapter offers a methodological proposal, with related IT tools, on how to conduct a study of this nature, from the initial reading to the interpretation of data and the conclusions. The present research has made it possible to trace and better define the field of authorship attribution studies, both from a historical and a practical-methodological perspective. Additionally, the contribution of digital humanities, and consequently of its interactions with other disciplines, have proved to be fundamental.

APA, Harvard, Vancouver, ISO, and other styles

41

Bertocchini, Pietro. "Il dilemma dell'autenticità del «Clitofonte»: studio del dialogo e ipotesi di attribuzione." Doctoral thesis, Università degli studi di Padova, 2019. http://hdl.handle.net/11577/3424836.

Full text

Abstract:

Lo studio affronta, da varie prospettive, il dilemma dell’autenticità del «Clitofonte» e offre un’introduzione, una traduzione e un’analisi aggiornata e completa del testo e delle molte questioni che esso pone. Ai metodi di ricerca tradizionali sono stati affiancati strumenti di indagine stilometrica. Se, come sembra, l’autore non è Platone, il breve dialogo potrebbe esser stato scritto da un membro dell’Accademia suo contemporaneo.
The dissertation approaches the dilemma of «Clitophon»'s authenticity from different perspectives. It offers an introduction, a translation and a thorough and updated analysis of the text and of the many issues that it elicits. Some stylometric tools were deployed along with the traditional research methods. If, as it seems, the author of the dialogue is not Plato, it may have been written by a member of the Academy of his time.

APA, Harvard, Vancouver, ISO, and other styles

42

Хомицька, Ірина Юріївна. "Методи та засоби диференціації фоностатистичних структур функціональних стилів англійської мови." Diss., Національний університет «Львівська політехніка», 2021. https://ena.lpnu.ua/handle/ntb/56676.

Full text

Abstract:

Дисертаційна робота присвячена розв’язанню актуального наукового завдання – підвищення достовірності диференціації фоностатистичних структур стилів англійської мови. У дисертації розроблено метод комплексного аналізу диференціації фоностатистичних структур стилів англійської мови, який дає змогу підвищити достовірність диференціації фоностатистичних структур стилів англійської мови. Розроблено багатофакторний метод визначення ступеню дії факторів стилю, підстилю та авторської манери викладу. Метод забезпечує підвищення достовірності здійснення авторської атрибуції тексту за його загальною стилевою маркованістю. Вдосконалено статистичну модель стилевої, підстилевої та авторської диференціації за методом гіпотез та методом ранжування, яка уможливлює підвищення достовірності визначення диференційних ознак стилів, підстилів та текстів різних авторів. Вдосконалено статистичну модель визначення стилерозрізняльної здатності груп приголосних фонем, яка дає змогу зменшити кількість груп приголосних фонем та підвищити рівень автоматизації диференціації текстів за групою приголосних фонем з найвищою стилерозрізняльною здатністю. Розроблено програмну систему диференціації стилів англійської мови на фонологічному рівні та наведено результати дослідження. Диссертационная работа посвящена решению актуального научного задания – повышения достоверности дифференциации фоностатистических структур стилей английского языка. В диссертации разработан метод комплексного анализа дифференциации фоностатистических структур стилей английского языка, который дает возможность повысить достоверность дифференциации фоностатистических структур стилей английского языка. Разработан многофакторный метод определения степени действия факторов стиля, подстиля и авторской манеры изложения. Метод обеспечивает повышение достоверности осуществления авторской атрибуции текста за его общей маркированностю. Усовершенствовано статистическую модель стилевой, подстилевой и авторской дифференциации методом гипотез и методом ранжирования, которая дает возможность повысить достоверность определения дифференциальных признаков стилей, подстилей и текстов разных авторов. Усовершенствовано статистическую модель определения стилеразличительной способности групп согласных фонем, которая дает возможность уменьшить количество групп согласных фонем и повысить уровень автоматизации дифференциации текстов по группам согласных фонем с наивысшей стилеразличительной способностью. Разработана программная система дифференциации стилей английского языка на фонологическом уровне и приведены результаты исследования. The thesis is devoted to the topical scientific problem of development of the mathematical methods and models that can improve the test validity of differentiation of phonostatistical structures of English functional styles. In the introduction, the relevance of the thesis theme is substantiated and the purpose is stated. The information about the program’s testing, and author’s contribution is given. In the first part of the thesis, the current state of the style differentiation problem has been analyzed. The results of the analysis have shown that the task of the test validity enhancement should be solved. Consequently, a method based on a combination of statistical methods and a multifactor method of determination of style factor and authorial factor effect should be developed to improve the test validity of style differentiation. In the second part, the method of complex analysis of differentiation of phonostatistical structures of styles has been developed. It is based on the proposed combination of the hypothesis, the ranking and the style distance determination methods. The method makes it possible to improve the test validity of style differentiation. The developed multifactor method of determination of style and authorial factor effect allows us to improve the test validity of authorship attribution. In the third part, the statistical model of style and authorial differentiation by the hypothesis and ranking methods and the statistical model of determination of styledifferentiating capability of a consonant group have been improved. The models enhance the test validity of style and authorship attribution and level of automation of text differentiation. In the fourth part, the software for style and authorial differentiation has been developed. The software is based on the developed methods using the Java programming language. According to the obtained results, the essential differences have been established between compared in pairs the belles-lettres, colloquial, newspaper, publicist and scientific functional styles. Authorship attribution has been done for the texts by G. G. Byron, T. Moore, E. Bronte, W. Thackeray, B. Obama, D. Trump, S. Logan and D. Webster.

APA, Harvard, Vancouver, ISO, and other styles

43

Viverit, Guido. "Problemi di attribuzione conflittuale nella musica strumentale veneta del Settecento." Doctoral thesis, Università degli studi di Padova, 2015. http://hdl.handle.net/11577/3423996.

Full text

Abstract:

This thesis takes into consideration conflicting attributions, an issue occurring when a composition is ascribed to different authors in different sources. The aim of the research has been to investigate in depth this phenomenon in order to highlight its causes, considering in particular case studies from the repertory of instrumental music of Eighteenth century Veneto, analysed both from the historical-musicological and conceptual standpoint. Three cases of study were carefully selected as representative of the wider repertory: the Concert for oboe in D minor attributed to Alessandro and Benedetto Marcello, Antonio Vivaldi and Johann Sebastian Bach; the collection of trio sonatas attributed to Domenico Gallo and Giovanni Battista Pergolesi; the collection of Concerti a cinque op.1 libro terzo, attributed to Giuseppe Tartini and Gasparo Visconti. The investigation has in the first place allowed locating new sources and fresh information relative to the persons involved in the attributions. The detailed reconstruction of the history of the attributions and the examination of sources made it possible to advance different hypotheses on the originating factors of the conflicting attributions. More generally, the thesis attempts to investigate in depth all the aspects related to the context in which a work was produced and transmitted, the economic interests involved in the circulation of a musical work, the mode of production of mss. and printed sources, the practical and legal tools adopted by composers in order to protect their work and the own authorial condition and, in conclusion the concepts of author and intellectual property in the instrumental music of the mid-eighteenth century are questioned.
La tesi affronta il fenomeno delle attribuzioni conflittuali, un problema che si verifica quando una composizione è attribuita a differenti autori nelle fonti in qui essa appare. Lo scopo della ricerca è stato quello di approfondire il fenomeno per comprenderne le cause, considerando come ambito di indagine la musica strumentale veneta del Settecento e ponendo particolare attenzione sia all’aspetto storico-musicologico che a quello concettuale. Per indagare più a fondo il fenomeno sono stati presi in esame tre casi di studio attentamente selezionati in quanto rappresentativi dell’ampia casistica che il repertorio presenta: il Concerto per oboe in Re minore attribuito ad Alessandro e Benedetto Marcello, Antonio Vivaldi e Johann Sebastian Bach; la raccolta di Sonate a tre attribuite a Domenico Gallo e Giovanni Battista Pergolesi; la raccolta di Concerti a cinque op. 1 libro terzo attribuita a Giuseppe Tartini e a Gasparo Visconti. L’indagine riguardante i singoli casi di studio ha condotto all’individuazione di nuovi testimoni e di nuove informazioni relative ai soggetti coinvolti nelle attribuzioni. La ricostruzione dettagliata della storia attributiva e l’esame delle fonti ha reso possibile avanzare alcune ipotesi in merito all’origine delle varie attribuzioni considerate. Più in generale la tesi tenta di indagare in profondità tutti gli aspetti relativi al contesto in cui un’opera nacque e fu trasmessa; agli interessi economici che gravitarono attorno alla diffusione di un’opera; alle modalità di produzione delle fonti musicali; agli strumenti di cui il compositore disponeva per tutelare la propria opera e la propria condizione autoriale; in definitiva, si interroga sul concetto di autore e di proprietà intellettuale nell’ambito della musica strumentale medio-settecentesca.

APA, Harvard, Vancouver, ISO, and other styles

44

Nobre, Neto Francisco Dantas. "Atribuição automática de autoria de obras da literatura brasileira." Universidade Federal da Paraíba, 2010. http://tede.biblioteca.ufpb.br:8080/handle/tede/6121.

Full text

Abstract:

Made available in DSpace on 2015-05-14T12:36:48Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 1280792 bytes, checksum: d335d67b212e054f48f0e8bca0798fe5 (MD5) Previous issue date: 2010-01-19
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Authorship attribution consists in categorizing an unknown document among some classes of authors previously selected. Knowledge about authorship of a text can be useful when it is required to detect plagiarism in any literary document or to properly give the credits to the author of a book. The most intuitive form of human analysis of a text is by selecting some characteristics that it has. The study of selecting attributes in any written document, such as average word length and vocabulary richness, is known as stylometry. For human analysis of an unknown text, the authorship discovery can take months, also becoming tiring activity. Some computational tools have the functionality of extracting such characteristics from the text, leaving the subjective analysis to the researcher. However, there are computational methods that, in addition to extract attributes, make the authorship attribution, based in the characteristics gathered in the text. Techniques such as neural network, decision tree and classification methods have been applied to this context and presented results that make them relevant to this question. This work presents a data compression method, Prediction by Partial Matching (PPM), as a solution of the authorship attribution problem of Brazilian literary works. The writers and works selected to compose the authors database were, mainly, by their representative in national literature. Besides, the availability of the books has also been considered. The PPM performs the authorship identification without any subjective interference in the text analysis. This method, also, does not make use of attributes presents in the text, differently of others methods. The correct classification rate obtained with PPM, in this work, was approximately 93%, while related works exposes a correct rate between 72% and 89%. In this work, was done, also, authorship attribution with SVM approach. For that, were selected attributes in the text divided in two groups, one word based and other in function-words frequency, obtaining a correct rate of 36,6% and 88,4%, respectively.
Atribuição de autoria consiste em categorizar um documento desconhecido dentre algumas classes de autores previamente selecionadas. Saber a autoria de um texto pode ser útil quando é necessário detectar plágio em alguma obra literária ou dar os devidos créditos ao autor de um livro. A forma mais intuitiva ao ser humano para se analisar um texto é selecionando algumas características que ele possui. O estudo de selecionar atributos em um documento escrito, como tamanho médio das palavras e riqueza vocabular, é conhecido como estilometria. Para análise humana de um texto desconhecido, descobrir a autoria pode demandar meses, além de se tornar uma tarefa cansativa. Algumas ferramentas computacionais têm a funcionalidade de extrair tais características do texto, deixando a análise subjetiva para o pesquisador. No entanto, existem métodos computacionais que, além de extrair atributos, atribuem a autoria baseado nas características colhidas ao longo do texto. Técnicas como redes neurais, árvores de decisão e métodos de classificação já foram aplicados neste contexto e apresentaram resultados que os tornam relevantes para tal questão. Este trabalho apresenta um método de compressão de dados, o Prediction by Partial Matching (PPM), para solução do problema de atribuição de autoria de obras da literatura brasileira. Os escritores e obras selecionados para compor o banco de autores se deram, principalmente, pela representatividade que possuem na literatura nacional. Além disso, a disponibilidade dos livros em formato eletrônico também foi considerada. O PPM realiza a identificação de autoria sem ter qualquer interferência subjetiva na análise do texto. Este método, também, não faz uso de atributos presentes ao longo do texto, diferentemente de outros métodos. A taxa de classificação correta alcançada com o PPM, neste trabalho, foi de aproximadamente 93%, enquanto que trabalhos relacionados mostram uma taxa de acerto entre 72% e 89%. Neste trabalho, também foi realizado atribuição de autoria com a abordagem SVM. Para isso, foram selecionados atributos no texto dividido em dois tipos, sendo um baseado em palavras e o outro na contagem de palavrasfunção, obtendo uma taxa de acerto de 36,6% e 88,4%, respectivamente.

APA, Harvard, Vancouver, ISO, and other styles

45

Queralt, Sheila 1987. "Estudio piloto para la evaluación de evidencias lingüísticas en la comparación forense de textos mediante distribuciones poblacionales y relaciones de verosimilitudes." Doctoral thesis, Universitat Pompeu Fabra, 2015. http://hdl.handle.net/10803/318374.

Full text

Abstract:

La presente tesis propone la implementación de técnicas estadísticas en el análisis de variables lingüísticas con el fin de crear un modelo de distribución poblacional útil en el área de la comparación forense de textos escrito. Finalmente, en una última fase se pretende aplicar el marco teórico y metodológico de la razón de verosimilitud. El objetivo es poder mejorar los resultados en la tarea de atribuir/determinar la autoría con el fin de asesorar de manera más objetiva a los diferentes agentes judiciales y poder proteger a aquellas personas involucradas en procesos judiciales de un posible error de la justicia.
La present tesi proposa la implementació de tècniques estadístiques en l’anàlisi de variables lingüístiques per tal de crear un model de distribució poblacional útil en l’àrea de la comparació forense de textos escrits. Finalment, en una última fase es pretén aplicar el marc teòric i metodològic de la raó de verosimilitut. L’objectiu és poder millorar els resultats en la tasca d’atribuir/determinar l’autoria per tal d’assessorar d’una manera més objectiva els diversos agents judicials i poder protegir a totes aquelles persones involucrades en processos judicials d’un possible error de la justícia.
La present tesi proposa la implementació de tècniques estadístiques en l’anàlisi de variables lingüístiques per tal de crear un model de distribució poblacional útil en l’àrea de la comparació forense de textos escrits. Finalment, en una última fase es pretén aplicar el marc teòric i metodològic de la raó de verosimilitut. L’objectiu és poder millorar els resultats en la tasca d’atribuir/determinar l’autoria per tal d’assessorar d’una manera més objectiva els diversos agents judicials i poder protegir a totes aquelles persones involucrades en processos judicials d’un possible error de la justícia.

APA, Harvard, Vancouver, ISO, and other styles

46

Jacovino, Julia Maureen. "Authorship Attribution Through Words Surrounding Named Entities." 2013. http://digital.library.duq.edu/u?/etd,162270.

Full text

Abstract:

In text analysis, authorship attribution occurs in a variety of ways. The field of computational linguistics becomes more important as the need of authorship attribution and text analysis becomes more widespread. For this research, pre-existing authorship attribution software, Java Graphical Authorship Attribution Program (JGAAP), implements a named entity recognizer, specifically the Stanford Named Entity Recognizer, to probe into similar genre text and to aid in extricating the correct author. This research specifically examines the words authors use around named entities in order to test the ability of these words at attributing authorship
McAnulty College and Graduate School of Liberal Arts;
Computational Mathematics
MS;
Thesis;

APA, Harvard, Vancouver, ISO, and other styles

47

Xie, Cheng-En, and 謝承恩. "Quantitative Authorship Attribution in Early Chinese Buddhist Translations." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/76439061447388436507.

Full text

Abstract:

碩士
法鼓佛教學院
佛教學系
100
The Taishō edition of the Chinese Buddhist canon (1924-1932) collects ca. 1000 Indian texts that were translated into Chinese between the 2nd and the 11th century CE. 153 of these texts are marked as 失譯 indicating that the name(s) of the translator(s) are unknown. For the texts translated between the 2nd and the late 6th century, however, we have to confront the dilemma that many attributions are uncertain, problematic or simply wrong. Over the years Buddhist scholars have leveraged traditional text-critical methods to corroborate or dispute traditional attributions. Although these methods method can produce high quality results, they often rely heavily on the intuition of a single scholar honed over many years of research. Information technology offers an alternative vector of inquiry that aims to complement rather than supersede more traditional approaches. For this we will adopt statistical, quantitative methods and artificial intelligence algorithms to analyze ancient Buddhist texts translated into Chinese in order to discover new evidence to address the translator attribution. The major advantage of stylometrics and quantitative authorship attribution is being able to discover hidden patterns, which cannot be discerned by traditional approaches. In the past four decades, considerable attention has been paid to quantitative authorship attribution of literature in western languages; however, there have been only few attempts focusing on texts written in classical Chinese, much less in the particular form of ‘Indian Buddhist Chinese’ of early translated texts. In this paper, our main focus will be on grammatical particles (xuci 虛詞) that are widely used in classical Chinese to express grammatical relations. After measuring their occurrence in Indian Buddhist Chinese we use Principle Component Analysis (PCA) to discuss how their use reflects on the authorship of some selected sutras. Our analysis explores different scenarios that have to be accounted for such as a translator changing his style in the course of his career, how to understand commonalities between of contemporaneous translations, and how to quantify the difference between different translations of the same sutra. Also, in latter part of this thesis, we apply our analysis model with the three different translations of Gandhavyūha from different dynasties. We have conduct a series of experiments with different values of arguments. Through the experiments, we demonstrates that the T.278 was translated three to four hundred years earlier than T.279 and T.293 and shows which of its features can identify it as an earlier text.

APA, Harvard, Vancouver, ISO, and other styles

48

(9193709), Yifei Hu. "A Study of Media Polarization with Authorship Attribution Methods." Thesis, 2020.

Find full text

Abstract:

Media polarization is a serious issue that can affect someone's views, ranging from a scientific fact to the perceived results of a presidential election. The media outlets in the United States are aligned along political spectrum representing different stances on various issues. Without providing any false information (but usually by omitting some facts), media outlets can report events by deliberately using the words and styles that favor particular political positions.

This research investigated the U.S. media polarization with authorship attribution approaches, analyzing stylistic differences between the left-leaning and right-leaning media and discovering specific linguistic patterns that made the news articles display biased political attitudes. Several models of authorship attribution were tested while controlling for topic, stance, and style, and were applied to media companies and their identity within a political spectrum. Style features that were compared included semantic and/or sentiment-related information, such as stance taking, with features that seemingly do not capture it, such as part of speech tags. The results demonstrate that a successful classification of articles as left-leaning or right-learning is possible regardless of their stance. Finally, we provide an analysis of the patterns that we found.

APA, Harvard, Vancouver, ISO, and other styles

49

TANI, RAFFAELLI GIULIO. "Generative models for inference: an application to authorship attribution." Doctoral thesis, 2022. http://hdl.handle.net/11573/1637529.

Full text

Abstract:

Computer-aided stylometry is a powerful tool in authorship attribution. Recent models can point the author of an anonymous text among thousands or distinguish different contributors to one text. However, most methods are quite complex and depend on the language. We propose a new Authorship Attribution method based on inference using a stochastic process. Every author is associated with the process that is most likely to reproduce their known corpus. We assign a text to the author whose process gives the highest probability of producing the text. We find high attribution rates independent of the language of the text or the tokenisation. Inference using stochastic processes offers exciting opportunities for stylometry and information retrieval.

APA, Harvard, Vancouver, ISO, and other styles

50

King, Edmund (Edmund George Coghill). "In the character of Shakespeare: canon, authorship and attribution in eighteenth-century England." 2008. http://hdl.handle.net/2292/2615.

Full text

Abstract:

At various points between 1709 and 1821, Shakespeare’s scholarly editors called into question the authenticity - either in whole or in part - of at least seventeen of the plays attributed to him in the First Folio. Enabled largely by Alexander Pope’s attack, in his 1723–25 edition of Shakespeare, on the Folio’s compilers, eighteenth-century textual critics constructed a canon based upon their own critical senses, rather than the ‘authority of copies’. They also discussed the genuineness of works that had been excluded from the 1623 Folio - Pericles, The Two Noble Kinsmen, Edward III, the Sonnets, and the poems published in The Passionate Pilgrim. Although these debates had little effect on the contents of the variorum edition - by 1821, only Pericles, the Sonnets, and the narrative poems had been added to the canonarguments and counter-arguments about the authenticity of Shakespeare’s works continued to abound in the notes. These would, in turn, influence the opinions of new generations of critics throughout the nineteenth and early twentieth centuries. In this thesis, I return to these earlier canonical judgements, not in order to resuscitate them, but to ask what they reveal about eighteenth-century conceptions of authorship, collaboration, and canonicity. Authorship in the period was not understood solely in terms of ‘possessive individualism’. Neither were arguments over Shakespeare’s style wholly contingent upon new discourses of literary property that had developed in the wake of copyright law. Instead, I argue, the discourse of personal style that editors applied to Shakespeare emerged out of a pre-existing classical-humanist scholarly tradition. Other commentators adopted the newly fashionable language of connoisseurship to determine where Shakespeare’s authorial presence lay. Another group of scholars turned to contemporary stage manuscript practices to ascertain where, and why, the words of other speakers might have entered his plays. If, however, Shakespeare’s plays were only partly his, this implied that Shakespeare had written alongside other writers. In the last part of my thesis, I examine the efforts of eighteenth-century critics to understand the social contexts of early modern dramatic authorship. Pope represented the theatre as an engine of social corruption, whose influence had debased Shakespeare’s standards of art and language. Other eighteenth-century commentators, however, had a more positive understanding of the social aspects of authorship. Drawing on contemporary discourses of friendship and sociability, they imagined the Elizabethan stage as a friendship-based authorial credit network, where playwrights collaborated with their contemporaries in the expectation of a return on their own works. This language of sociable co-authorship in turn influenced the way in which Shakespearean collaboration was understood. Conceptions of Shakespearean authorship and canonicity in the period, I conclude, were - like authorship in the Shakespeare canon itself - not singular, but manifold and multivocal.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Authorship attribution'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles