Dissertations / Theses on the topic 'Text'

To see the other types of publications on this topic, follow the link: Text.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Text.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Berio, Luciano. "Text of Texts." Bärenreiter Verlag, 1998. https://slub.qucosa.de/id/qucosa%3A36791.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

El, Morabit Karim [Verfasser], and U. [Akademischer Betreuer] Husemann. "Measurement of $\textt}\bar\text{t}}\text{H}(\text{H}\rightarrow \text{b}\bar{\text{b}})$ production in the semi-leptonic $\text{t}\bar{\text{t}}$ decay channel at the CMS Experiment / Karim El Morabit ; Betreuer: U. Husemann." Karlsruhe : KIT-Bibliothek, 2021. http://d-nb.info/1238147801/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Beaulieu, Derek. "Text without text : concrete poetry and conceptual writing." Thesis, University of Roehampton, 2015. https://pure.roehampton.ac.uk/portal/en/studentthesis/text-without-text(9881aca7-f74a-4f6e-b8d2-58d83c01d7ae).html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Concrete poetry has been posited as the only truly international poetic movement of the 20th Century, with Conceptual writing receiving the same cultural location for the 21st-Century. Both forms are dedicated to a materiality of textual production, a poetic investigation into how language occupies space. My dissertation, Text Without Text: Concrete Poetry and Conceptual Writing consists of three chapters: “Dirty”, “Clean” and Conceptual.” Chapter One outlines how degenerated text features in Canadian avant-garde poetics and how my own work builds upon traditions formulated by Canadian poets bpNichol, bill bissett and Steve McCaffery, and can be formulated as an “inarticulate mark,” embodying what American theorist Sianna Ngai refers to as a “poetics of disgust.” Chapter Two, “Clean,” situates my later work around the theories of Eugen Gomringer, the Noigandres Group and Mary Ellen Solt; the clean affectless use of the particles of language in a means which echoes modern advertising and graphic design to create universally understood poetry embracing logos, trademarks and way-finding signage. Chapter Three, “Conceptual,” bridges my concrete poetry with my work in Conceptual writing—especially my novels Local Colour and Flatland. Conceptual writing, as theorized by Kenneth Goldsmith, Vanessa Place and others, works to interrogate a poetics of “uncreativity,” plagiarism, digitally aleatory writing and procedurality. Text Without Text: Concrete Poetry and Conceptual Writing also includes three appendices that outline my poetic oeuvre to date.
4

Haggren, Hugo. "Text Similarity Analysis for Test Suite Minimization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290239.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Software testing is the most expensive phase in the software development life cycle. It is thus understandable why test optimization is a crucial area in the software development domain. In software testing, the gradual increase of test cases demands large portions of testing resources (budget and time). Test Suite Minimization is considered a potential approach to deal with the test suite size problem. Several test suite minimization techniques have been proposed to efficiently address the test suite size problem. Proposing a good solution for test suite minimization is a challenging task, where several parameters such as code coverage, requirement coverage, and testing cost need to be considered before removing a test case from the testing cycle. This thesis proposes and evaluates two different NLP-based approaches for similarity analysis between manual integration test cases, which can be employed for test suite minimization. One approach is based on syntactic text similarity analysis and the other is a machine learning based semantic approach. The feasibility of the proposed solutions is studied through analysis of industrial use cases at Ericsson AB in Sweden. The results show that the semantic approach barely manages to outperform the syntactic approach. While both approaches show promise, subsequent studies will have to be done to further evaluate the semantic similarity based method.
Mjukvarutestning är den mest kostsamma fasen inom mjukvaruutveckling. Därför är det förståeligt varför testoptimering är ett kritiskt område inom mjukvarubranschen. Inom mjukvarutestning ställer den gradvisa ökningen av testfall stora krav på testresurser (budget och tid). Test Suite Minimization anses vara ett potentiellt tillvägagångssätt för att hantera problemet med växande testsamlingar. Flera minimiseringsmetoder har föreslagits för att effektivt hantera testsamlingars storleksproblem. Att föreslå en bra lösning för minimering av antal testfall är en utmanande uppgift, där flera parametrar som kodtäckning, kravtäckning och testkostnad måste övervägas innan man tar bort ett testfall från testcykeln. Denna uppsats föreslår och utvärderar två olika NLP-baserade metoder för likhetsanalys mellan testfall för manuell integration, som kan användas för minimering av testsamlingar. Den ena metoden baseras på syntaktisk textlikhetsanalys, medan den andra är en maskininlärningsbaserad semantisk strategi. Genomförbarheten av de föreslagna lösningarna studeras genom analys av industriella användningsfall hos Ericsson AB i Sverige. Resultaten visar att den semantiska metoden knappt lyckas överträffa den syntaktiska metoden. Medan båda tillvägagångssätten visar lovande resultat, måste efterföljande studier göras för att ytterligare utvärdera den semantiska likhetsbaserade metoden.
5

Ly, Man Dan. "Text to features for Swedish text." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-396578.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In text mining, texts are usually transformed into numerical vectors or feature vectors, before they are given to a machine learning algorithm for text classification. In this project, a set of features for classifying tweets in Swedish was created. The following classification tasks were selected: gender, age and political party prediction, sentiment analysis and authorship attribution, which is the task of determining if a text was written by a particular author or not. Relevant previous studies were researched and a suitable subset of features used in those studies were chosen. A tool was developed that preprocesses the tweets and calculates, for each tweet, values for the features in the feature set. Experiments were run on a data set consisting of tweets written by Swedish politicians. The output of the tool was given to a machine learning algorithm that created classification models. While the first four classification tasks were unsuccessful, some of the authorship attribution models managed to produce an F-score between 80 and 90%. For the failed classification tasks, the features need to be tested on a different data set or new features have to be created
6

Jung, Ki-Ho. "Text." Hannover : Techn. Univ, 1988. http://www.gbv.de/dms/weimar/toc/12547072X_toc.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Wilson, Christin M. L. "Variation and Text Type in Old Occitan Texts." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1331136026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

SOARES, FABIO DE AZEVEDO. "AUTOMATIC TEXT CATEGORIZATION BASED ON TEXT MINING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23213@1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
A Categorização de Documentos, uma das tarefas desempenhadas em Mineração de Textos, pode ser descrita como a obtenção de uma função que seja capaz de atribuir a um documento uma categoria a que ele pertença. O principal objetivo de se construir uma taxonomia de documentos é tornar mais fácil a obtenção de informação relevante. Porém, a implementação e a execução de um processo de Categorização de Documentos não é uma tarefa trivial: as ferramentas de Mineração de Textos estão em processo de amadurecimento e ainda, demandam elevado conhecimento técnico para a sua utilização. Além disso, exercendo grande importância em um processo de Mineração de Textos, a linguagem em que os documentos se encontram escritas deve ser tratada com as particularidades do idioma. Contudo há grande carência de ferramentas que forneçam tratamento adequado ao Português do Brasil. Dessa forma, os objetivos principais deste trabalho são pesquisar, propor, implementar e avaliar um framework de Mineração de Textos para a Categorização Automática de Documentos, capaz de auxiliar a execução do processo de descoberta de conhecimento e que ofereça processamento linguístico para o Português do Brasil.
Text Categorization, one of the tasks performed in Text Mining, can be described as the achievement of a function that is able to assign a document to the category, previously defined, to which it belongs. The main goal of building a taxonomy of documents is to make easier obtaining relevant information. However, the implementation and execution of Text Categorization is not a trivial task: Text Mining tools are under development and still require high technical expertise to be handled, also having great significance in a Text Mining process, the language of the documents should be treated with the peculiarities of each idiom. Yet there is great need for tools that provide proper handling to Portuguese of Brazil. Thus, the main aims of this work are to research, propose, implement and evaluate a Text Mining Framework for Automatic Text Categorization, capable of assisting the execution of knowledge discovery process and provides language processing for Brazilian Portuguese.
9

Baker, Simon. "Semantic text classification for cancer text mining." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/275838.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Cancer researchers and oncologists benefit greatly from text mining major knowledge sources in biomedicine such as PubMed. Fundamentally, text mining depends on accurate text classification. In conventional natural language processing (NLP), this requires experts to annotate scientific text, which is costly and time consuming, resulting in small labelled datasets. This leads to extensive feature engineering and handcrafting in order to fully utilise small labelled datasets, which is again time consuming, and not portable between tasks and domains. In this work, we explore emerging neural network methods to reduce the burden of feature engineering while outperforming the accuracy of conventional pipeline NLP techniques. We focus specifically on the cancer domain in terms of applications, where we introduce two NLP classification tasks and datasets: the first task is that of semantic text classification according to the Hallmarks of Cancer (HoC), which enables text mining of scientific literature assisted by a taxonomy that explains the processes by which cancer starts and spreads in the body. The second task is that of the exposure routes of chemicals into the body that may lead to exposure to carcinogens. We present several novel contributions. We introduce two new semantic classification tasks (the hallmarks, and exposure routes) at both sentence and document levels along with accompanying datasets, and implement and investigate a conventional pipeline NLP classification approach for both tasks, performing both intrinsic and extrinsic evaluation. We propose a new approach to classification using multilevel embeddings and apply this approach to several tasks; we subsequently apply deep learning methods to the task of hallmark classification and evaluate its outcome. Utilising our text classification methods, we develop and two novel text mining tools targeting real-world cancer researchers. The first tool is a cancer hallmark text mining tool that identifies association between a search query and cancer hallmarks; the second tool is a new literature-based discovery (LBD) system designed for the cancer domain. We evaluate both tools with end users (cancer researchers) and find they demonstrate good accuracy and promising potential for cancer research.
10

Powers, Harold S. "Music as Text and Text as Music." Bärenreiter Verlag, 1998. https://slub.qucosa.de/id/qucosa%3A36794.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Krummacher, Friedhelm. "Text im Text. Über Vokalmusik und Texttheorie." Bärenreiter Verlag, 1998. https://slub.qucosa.de/id/qucosa%3A36796.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Mešár, Marek. "Svět kolem nás jako hyperlink." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236204.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Document describes selected techniques and approaches to problem of text detection, extraction and recognition on modern mobile devices. It also describes their proper presentation to the user interface and their conversion to hyperlinks as a source of information about surrounding world. The paper outlines text detection and recognition technique based on MSER detection and also describes the use of image features tracking method for text motion estimation.
13

Lorentzon, Daniela. "Från satsatom till hel text : Sammanhang i text." Thesis, Växjö University, Växjö University, Växjö University, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-5223.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

Utgångspunkten för undersökningen som presenteras i den här uppsatsen är Bengt Sigurds studie av textbindning i elevtext, Tre experiment med text: Att presentera Buffalo Bill (1977).   Syftet med den här uppsatsen är att undersöka textbindning i elevtexter, skrivna utifrån samma uppgift (Sigurd 1977) för att se hur eleverna löser textbindningen i en satsatomuppgift. En satsatom är minsta möjliga informationsbärande sats, t.ex. Flickan går och Bollen är rund. Elever från ett yrkesförberedande och ett studieinriktat program fick en satsatomuppgift om westernhjälten Buffalo Bill. Uppgiften innebar att de skulle skriva en sammanhängande och välformulerad text utifrån givna 24 satsatomer. Totalt åtta uppsatser, fyra från varje klass, har undersökts med fokus på textbindning, innehåll och sammanhang.

          Studien visar att de båda klasserna uppfattar instruktionerna på olika sätt och detta får konsekvenser för innehållet och de textbindningar som undersöks. Framför allt visar sig detta när det gäller referensbindningarna. Den yrkesförberedande klassen, har lagt till informationsbärande satser i sina texter, och får därför fler delidentiteter och associativa bindningar än den studieförberedande klassen som inte har lagt till någon egen information i sina texter. Vad gäller konnektiva bindningar så har den yrkesförberedande klassen fler antal konnektiver och konnektiverna är karaktäristiska för en berättande text. I tre av åtta elevtexter förekommer brister i textbindningen, t.ex. felaktigt val av konnektiver och syftningsfel inom ledfamiljerna. Dessa brister i textbindningen stör sammanhanget i texten och därmed läsförståelsen.

14

Nyns, Roland. "Text grammar and text processing: a cognitivist approach." Doctoral thesis, Universite Libre de Bruxelles, 1989. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/213285.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Goyette, Els Spekkens. "Second-language text comprehension : knowledge and text type." Thesis, McGill University, 1991. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=59956.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The purpose of this study was to compare first- and second-language text comprehension across passage types.
Results indicate that there was no main effect for language when the total texts were compared. In contrast, a large difference was found for the type of passage read. Significantly higher recall and inferencing were found on the passages for which subjects had prior knowledge, regardless of the language of presentation. Although global comprehension measures did not reveal differences in text processing, more detailed paragraph-level analyses indicated that text processing differences were present.
Total reading times indicated that there was a large effect for the language in which the passage was read, with significantly longer reading times recorded for passages read in the second language.
These findings were interpreted as an indication that second-language reading comprehension capacity is underestimated. The findings also suggest that the type of passage read influences text comprehension more than the language in which it is read.
16

NUNES, IAN MONTEIRO. "CLUSTERING TEXT STRUCTURED DATA BASED ON TEXT SIMILARITY." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=25796@1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
O presente trabalho apresenta os resultados que obtivemos com a aplicação de grande número de modelos e algoritmos em um determinado conjunto de experimentos de agrupamento de texto. O objetivo de tais testes é determinar quais são as melhores abordagens para processar as grandes massas de informação geradas pelas crescentes demandas de data quality em diversos setores da economia. O processo de deduplicação foi acelerado pela divisão dos conjuntos de dados em subconjuntos de itens similares. No melhor cenário possível, cada subconjunto tem em si todas as ocorrências duplicadas de cada registro, o que leva o nível de erro na formação de cada grupo a zero. Todavia, foi determinada uma taxa de tolerância intrínseca de 5 porcento após o agrupamento. Os experimentos mostram que o tempo de processamento é significativamente menor e a taxa de acerto é de até 98,92 porcento. A melhor relação entre acurácia e desempenho é obtida pela aplicação do algoritmo K-Means com um modelo baseado em trigramas.
This document reports our findings on a set of text clusterig experiments, where a wide variety of models and algorithms were applied. The objective of these experiments is to investigate which are the most feasible strategies to process large amounts of information in face of the growing demands on data quality in many fields. The process of deduplication was accelerated through the division of the data set into individual subsets of similar items. In the best case scenario, each subset must contain all duplicates of each produced register, mitigating to zero the cluster s errors. It is established, although, a tolerance of 5 percent after the clustering process. The experiments show that the processing time is significantly lower, showing a 98,92 percent precision. The best accuracy/performance relation is achieved with the K-Means Algorithm using a trigram based model.
17

Ludlow, Nelson David. "Pictorial representation of text : converting text to pictures." Thesis, University of Edinburgh, 1993. http://hdl.handle.net/1842/19944.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
'A picture is worth a thousand words.' This saying suggests that an inter-relationship exists between text and pictures. This thesis is the result of an investigation to identify and exploit the inter-relationships between text and pictures. It describes a concept of pictorial representation of text, presents a Text-to-Pictures System which generates a pictorial representation of English sentences, and gives a more detailed look at how the system pictorially represents specific linguistic types of natural language expressions. Although very little previous work has been done in this area, the relevant work in text-pictures systems is summarized. Most of the past work concentrated on pictorially representing the nouns and some spatial prepositions. My work expands the pictorial representation to include temporal expressions, conjunction, relative clauses, quantification, and some verb features. This thesis also addresses the concepts of using pictorial representation for data fusion of large natural language texts, as well as the problems of ambiguity and vagueness. The working system to 'convert text to pictures' is demonstrated. The system structure, intermediate representation schemes, and the translation process are described. Several types of natural language expressions are examined and the corresponding pictorial representations are shown. Also shown is an application to pictorially represent all of the possible meanings of an ambiguous sentence to allow a non-linguist user to choose the intended meaning. Using a natural language text processing system that can convert a sentence of text into a logical form (LF) representation, I show what is required to convert the LF representation into a pictorial representation. The process involves identifying the objects contained in the LF and representing them by icons. These icons are placed in an imaginary space via a set of constraints. After all of the constraints are determined, the system attempts to solve the generated constraint satisfaction problem. If a solution is found, the icons are drawn with the appropriate coordinates on a graphics display.
18

Berglund, Anna. "Tonsätta text." Thesis, Kungl. Musikhögskolan, Institutionen för jazz, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kmh:diva-2040.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Swanson, Erin. "Generation text." Click here for online access, 2008. http://hdl.handle.net/10504/90.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Sharp, L. Kathryn. "Text Complexity." Digital Commons @ East Tennessee State University, 2014. https://dc.etsu.edu/etsu-works/4290.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Yum, Sichang. "How coherent texts with varied examples help students learn about statistics : a test of text processing models /." Digital version accessible at:, 1998. http://wwwlib.umi.com/cr/utexas/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Biedert, Ralf [Verfasser]. "Gaze-Based Human-Text Interaction/Text 2.0 / Ralf Biedert." München : Verlag Dr. Hut, 2014. http://d-nb.info/1050331605/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Lee, Hyo Sook. "Automatic text processing for Korean language free text retrieval." Thesis, University of Sheffield, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.322916.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Nackoul, David Douglas. "Text to Text : plot unit searches generated from English." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61175.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 51).
The story of Macbeth centers around revenge. World War I was started by an act of revenge. Even though these two stories are seemingly unrelated, humans use the same concept to draw meaning from them. Plot units, revenge included, are the common set of structures found in human narrative. They are the mistakes, the successes, the revenges and the Pyhrric victories. They are the basic building blocks of stories. In order to build a computational model of human intelligence, it is clear that we must understand how to process plot units. This thesis takes a step in that direction. It presents an English template for describing plot units and a system that is capable of turning these descriptions into plot-unit searches on stories. It currently processes 26 plot units, and finds 10 plot units spread out over Macbeth, Hamlet, the E-R Cyber Conflict, and a collection of legal case briefs.
by David Douglas Nackoul.
M.Eng.
25

Роєнко, Л. В. "Text title as a tool in achieving text coherence." Thesis, Вінницький державний педагогічний університет імені Михайла Коцюбинського, 2018. https://er.knutd.edu.ua/handle/123456789/12539.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
У даній статті досліджується роль заголовку тексту у досягненні когерентності тексту. З'ясовуються функції заголовку у художньому тексті та його внесок у досягнення цілісності тексту.
The role of text title in achieving text coherence is being analyzed in the given article. The functions of the title in the fictional text and its contribution in achieving text integrity have been considered.
26

Lovid, Marcus. "Jag är inte dyslexi version 20 : Text på text." Thesis, Konstfack, IBIS - Institutionen för bild- och slöjdpedagogik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:konstfack:diva-5920.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract  I denna uppsats handlar det om mina erfarenheter som dyslektiker och vad mina svårigheter har varit i relation till min text. Genom att skriva en text som sedan blir omarbetas och trycks upp på tyg som en gestaltning hoppas jag att jag kan nå ut till fler människor och skapa en förståelse för hur det är att vara dyslektiker och hur man kan använda sig av det i sin undervisning för att underlätta för eleverna. Till exempel att dela upp text i mindre stycken för att ta in information, eller att använda sig av kroki för att öva hjärnan på att kopiera text med snabba och enkla rörelser. Detta ger då en dyslektiker möjligheten att lättare kunna skriva av från tavlan eller lättare skriva anteckningar från tavlan. Denna uppsats är även en uppgörelse med min diagnos och hur jag bearbetar den och försöker förstå min dyslexi. Därför har jag skapat en gestaltning i relation till min text. Jag har tryckt upp min uppsats på tyg, ett tyg med grå bakgrund, ljusgrå text och röd markerad text, som är tre meter brett och fem meter långt. Detta för att betraktaren skulle få uppleva hur det var att vara dyslektiker och uppleva svårigheten att fokusera, läsa och hitta tillbaka i en text som aldrig tar slut. De rö dmarkerade textstyckena blir en avlastning för ögat men även information om min uppsats om mina inre tankar för att skapa en nyfikenhet till betraktaren.
27

Danielsson, Benjamin. "A Study on Text Classification Methods and Text Features." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159992.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
When it comes to the task of classification the data used for training is the most crucial part. It follows that how this data is processed and presented for the classifier plays an equally important role. This thesis attempts to investigate the performance of multiple classifiers depending on the features that are used, the type of classes to classify and the optimization of said classifiers. The classifiers of interest are support-vector machines (SMO) and multilayer perceptron (MLP), the features tested are word vector spaces and text complexity measures, along with principal component analysis on the complexity measures. The features are created based on the Stockholm-Umeå-Corpus (SUC) and DigInclude, a dataset containing standard and easy-to-read sentences. For the SUC dataset the classifiers attempted to classify texts into nine different text categories, while for the DigInclude dataset the sentences were classified into either standard or simplified classes. The classification tasks on the DigInclude dataset showed poor performance in all trials. The SUC dataset showed best performance when using SMO in combination with word vector spaces. Comparing the SMO classifier on the text complexity measures when using or not using PCA showed that the performance was largely unchanged between the two, although not using PCA had slightly better performance
28

McDonald, Daniel Merrill. "Combining Text Structure and Meaning to Support Text Mining." Diss., The University of Arizona, 2006. http://hdl.handle.net/10150/194015.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Text mining methods strive to make unstructured text more useful for decision making. As part of the mining process, language is processed prior to analysis. Processing techniques have often focused primarily on either text structure or text meaning in preparing documents for analysis. As approaches have evolved over the years, increases in the use of lexical semantic parsing usually have come at the expense of full syntactic parsing. This work explores the benefits of combining structure and meaning or syntax and lexical semantics to support the text mining process.Chapter two presents the Arizona Summarizer, which includes several processing approaches to automatic text summarization. Each approach has varying usage of structural and lexical semantic information. The usefulness of the different summaries is evaluated in the finding stage of the text mining process. The summary produced using structural and lexical semantic information outperforms all others in the browse task. Chapter three presents the Arizona Relation Parser, a system for extracting relations from medical texts. The system is a grammar-based system that combines syntax and lexical semantic information in one grammar for relation extraction. The relation parser attempts to capitalize on the high precision performance of semantic systems and the good coverage of the syntax-based systems. The parser performs in line with the top reported systems in the literature. Chapter four presents the Arizona Entity Finder, a system for extracting named entities from text. The system greatly expands on the combination grammar approach from the relation parser. Each tag is given a semantic and syntactic component and placed in a tag hierarchy. Over 10,000 tags exist in the hierarchy. The system is tested on multiple domains and is required to extract seven additional types of entities in the second corpus. The entity finder achieves a 90 percent F-measure on the MUC-7 data and an 87 percent F-measure on the Yahoo data where additional entity types were extracted.Together, these three chapters demonstrate that combining text structure and meaning in algorithms to process language has the potential to improve the text mining process. A lexical semantic grammar is effective at recognizing domain-specific entities and language constructs. Syntax information, on the other hand, allows a grammar to generalize its rules when possible. Balancing performance and coverage in light of the world's growing body of unstructured text is important.
29

Haselton, Curt B. Deierlein Gregory G. "Assessing seismic collapse safety of modern reinforced concrete moment-frame buildings." Berkeley, Calif. : Pacific Earthquake Engineering Research Center, 2008. http://nisee.berkeley.edu/elibrary/Text/200803261.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Souksengphet-Dachlauer, Anna. "Text als Klangmaterial Heiner Müllers Texte in Heiner Goebbels' Hörstücken." Bielefeld transcript, 2009. http://d-nb.info/998766402/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Kavanagh, Judith. "The Text Analyzer: A tool for knowledge acquisition from texts." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/10149.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The world is being inundated with knowledge at an ever-increasing rate. As intelligent beings and users of knowledge, we must find new ways to locate particular items of information in this huge reservoir of knowledge or we will soon be overwhelmed with enormous quantities of documents that no one any longer has time to read. The vast majority of knowledge is still being stored in conventional text written in natural language, such as books and articles, rather than in more "advanced" forms like knowledge bases. With more and more of these texts being stored on-line rather than solely in print, an opportunity exists to make use of the power of the computer to aid in the location and analysis of knowledge in on-line texts. We propose a tool to do this--the Text Analyzer. We have combined methods from computational linguistics and artificial intelligence to provide the users of the Text Analyzer with a variety of options for finding information in documents, verifying the consistency of this information, performing word and conceptual analyses and other operations. Parsing and indexing are not used in the Text Analyzer. The Text Analyzer can be connected to CODE4, a knowledge management system, so that a knowledge base can be constructed as knowledge is found in the text. We believe this tool will be especially useful for linguists, knowledge engineers, and document specialists.
32

Unaldi, Aylin. "Investigating reading for academic purposes : sentence, text and multiple texts." Thesis, University of Bedfordshire, 2010. http://hdl.handle.net/10547/279255.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This study examines the nature of reading in academic environments and suggests ways for a more appropriate assessment of it. Research studies show that reading in academic settings is a complex knowledge management process in which information is selected, combined and organised from not a single, isolated text but from multiple information sources. This study initially gathered evidence from students studying at a British university on their perceived and observed reading purposes and processes in three studies; a large scale questionnaire, longitudinal reading diary study and finally individual interviews in order both to establish whether the prominent reading skills used by them were as put forth in the studies on academic reading, and to examine in detail the actual cognitive processes (reading operations) used in reading for academic purposes. The study draws on the reading theories that explain reading comprehension and focuses specifically on different levels of careful reading such as sentence, text and multiple texts in order to explicate that increasingly more complex cognitive processes explain higher levels of reading comprehension. Building on the findings from the three initial studies, it is suggested that reading tests of English for Academic Purposes (EAP) should involve not only local level comprehension questions but also reading tasks at text and multiple texts levels. For this aim, taking the Khalifa and Weir (2009) framework as the basis, cognitive processes extracted from the theories defining each level of reading, and contextual features extracted through the analysis of university course books were combined to form the test specifications for each level of careful reading and sample tests assessing careful reading at sentence, text and intertextuallevels were designed. Statistical findings confirmed the differential nature of the three levels of careful reading; however, the expected difficulty continuum could not be observed among the tests. Possible reasons underlying this are discussed, suggestions on reading tasks that might operationalise text level reading more efficiently and intertextual level reading more extensively are made and additional components of intertextual reading are offered for the Khalifa and Weir (2009) reading framework. The implications of the findings for the teaching and assessment of English for Academic Purposes are also discussed.
33

Souksengphet-Dachlauer, Anna. "Text als Klangmaterial : Heiner Müllers Texte in Heiner Goebbels' Hörstücken /." Bielefeld : Transcript, 2010. http://deposit.d-nb.de/cgi-bin/dokserv?id=3391383&prov=M&dok_var=1&dok_ext=htm.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Keith, Karin, and Renee Rice Moran. "Using Text Sets to Scaffold Student Reading of Complex Texts." Digital Commons @ East Tennessee State University, 2013. https://dc.etsu.edu/etsu-works/1012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Rosell, Magnus. "Text Clustering Exploration : Swedish Text Representation and Clustering Results Unraveled." Doctoral thesis, KTH, Numerisk Analys och Datalogi, NADA, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-10129.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Text clustering divides a set of texts into clusters (parts), so that texts within each cluster are similar in content. It may be used to uncover the structure and content of unknown text sets as well as to give new perspectives on familiar ones. The main contributions of this thesis are an investigation of text representation for Swedish and some extensions of the work on how to use text clustering as an exploration tool. We have also done some work on synonyms and evaluation of clustering results. Text clustering, at least such as it is treated here, is performed using the vector space model, which is commonly used in information retrieval. This model represents texts by the words that appear in them and considers texts similar in content if they share many words. Languages differ in what is considered a word. We have investigated the impact of some of the characteristics of Swedish on text clustering. Swedish has more morphological variation than for instance English. We show that it is beneficial to use the lemma form of words rather than the word forms. Swedish has a rich production of solid compounds. Most of the constituents of these are used on their own as words and in several different compounds. In fact, Swedish solid compounds often correspond to phrases or open compounds in other languages. Our experiments show that it is beneficial to split solid compounds into their parts when building the representation. The vector space model does not regard word order. We have tried to extend it with nominal phrases in different ways. We have also tried to differentiate between homographs, words that look alike but mean different things, by augmenting all words with a tag indicating their part of speech. None of our experiments using phrases or part of speech information have shown any improvement over using the ordinary model. Evaluation of text clustering results is very hard. What is a good partition of a text set is inherently subjective. External quality measures compare a clustering with a (manual) categorization of the same text set. The theoretical best possible value for a measure is known, but it is not obvious what a good value is – text sets differ in difficulty to cluster and categorizations are more or less adapted to a particular text set. We describe how evaluation can be improved for cases where a text set has more than one categorization. In such cases the result of a clustering can be compared with the result for one of the categorizations, which we assume is a good partition. In some related work we have built a dictionary of synonyms. We use it to compare two different principles for automatic word relation extraction through clustering of words. Text clustering can be used to explore the contents of a text set. We have developed a visualization method that aids such exploration, and implemented it in a tool, called Infomat. It presents the representation matrix directly in two dimensions. When the order of texts and words are changed, by for instance clustering, distributional patterns that indicate similarities between texts and words appear. We have used Infomat to explore a set of free text answers about occupation from a questionnaire given to over 40 000 Swedish twins. The questionnaire also contained a closed answer regarding smoking. We compared several clusterings of the text answers to the closed answer, regarded as a categorization, by means of clustering evaluation. A recurring text cluster of high quality led us to formulate the hypothesis that “farmers smoke less than the average”, which we later could verify by reading previous studies. This hypothesis generation method could be used on any set of texts that is coupled with data that is restricted to a limited number of possible values.
QC 20100806
36

Pfitzner, Darius Mark, and pfit0022@flinders edu au. "An Investigation into User Text Query and Text Descriptor Construction." Flinders University. Computer Science, Engineering and Mathematics, 2009. http://catalogue.flinders.edu.au./local/adt/public/adt-SFU20090805.141402.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Cognitive limitations such as those described in Miller's (1956) work on channel capacity and Cowen's (2001) on short-term memory are factors in determining user cognitive load and in turn task performance. Inappropriate user cognitive load can reduce user efficiency in goal realization. For instance, if the user's attentional capacity is not appropriately applied to the task, distractor processing can tend to appropriate capacity from it. Conversely, if a task drives users beyond their short-term memory envelope, information loss may be realized in its translation to long-term memory and subsequent retrieval for task base processing. To manage user cognitive capacity in the task of text search the interface should allow users to draw on their powerful and innate pattern recognition abilities. This harmonizes with Johnson-Laird's (1983) proposal that propositional representation is tied to mental models. Combined with the theory that knowledge is highly organized when stored in memory an appropriate approach for cognitive load optimization would be to graphically present single documents, or clusters thereof, with an appropriate number and type of descriptors. These descriptors are commonly words and/or phrases. Information theory research suggests that words have different levels of importance in document topic differentiation. Although key word identification is well researched, there is a lack of basic research into human preference regarding query formation and the heuristics users employ in search. This lack extends to features as elementary as the number of words preferred to describe and/or search for a document. Contrastive understanding these preferences will help balance processing overheads of tasks like clustering against user cognitive load to realize a more efficient document retrieval process. Common approaches such as search engine log analysis cannot provide this degree of understanding and do not allow clear identification of the intended set of target documents. This research endeavours to improve the manner in which text search returns are presented so that user performance under real world situations is enhanced. To this end we explore both how to appropriately present search information and results graphically to facilitate optimal cognitive and perceptual load/utilization, as well as how people use textual information in describing documents or constructing queries.
37

Temnikova, Irina. "Text complexity and text simplification in the crisis management domain." Thesis, University of Wolverhampton, 2012. http://hdl.handle.net/2436/297482.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Due to the fact that emergency situations can lead to substantial losses, both financial and in terms of human lives, it is essential that texts used in a crisis situation be clearly understandable. This thesis is concerned with the study of the complexity of the crisis management sub-language and with methods to produce new, clear texts and to rewrite pre-existing crisis management documents which are too complex to be understood. By doing this, this interdisciplinary study makes several contributions to the crisis management field. First, it contributes to the knowledge of the complexity of the texts used in the domain, by analysing the presence of a set of written language complexity issues derived from the psycholinguistic literature in a novel corpus of crisis management documents. Second, since the text complexity analysis shows that crisis management documents indeed exhibit high numbers of text complexity issues, the thesis adapts to the English language controlled language writing guidelines which, when applied to the crisis management language, reduce its complexity and ambiguity, leading to clear text documents. Third, since low quality of communication can have fatal consequences in emergency situations, the proposed controlled language guidelines and a set of texts which were re-written according to them are evaluated from multiple points of view. In order to achieve that, the thesis both applies existing evaluation approaches and develops new methods which are more appropriate for the task. These are used in two evaluation experiments – evaluation on extrinsic tasks and evaluation of users’ acceptability. The evaluations on extrinsic tasks (evaluating the impact of the controlled language on text complexity, reading comprehension under stress, manual translation, and machine translation tasks) Text Complexity and Text Simplification in the Crisis Management domain 4 show a positive impact of the controlled language on simplified documents and thus ensure the quality of the resource. The evaluation of users’ acceptability contributes additional findings about manual simplification and helps to determine directions for future implementation. The thesis also gives insight into reading comprehension, machine translation, and cross-language adaptability, and provides original contributions to machine translation, controlled languages, and natural language generation evaluation techniques, which make it valuable for several scientific fields, including Linguistics, Psycholinguistics, and a number of different sub-fields of NLP.
38

Anderson, Amy K. "Image/Text and Text/Image: Reimagining Multimodal Relationships through Dissociation." UKnowledge, 2014. http://uknowledge.uky.edu/english_etds/11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
W.J.T. Mitchell has famously noted that we are in the midst of a “pictorial turn,” and images are playing an increasingly important role in digital and multimodal communication. My dissertation addresses the question of how meaning is made when texts and images are united in multimodal arguments. Visual rhetoricians have often attempted to understand text-image arguments by privileging one medium over the other, either using text-based rhetorical principles or developing new image-based theories. I argue that the relationship between the two media is more dynamic, and can be better understood by applying The New Rhetoric’s concept of dissociation, which Chaim Perelman and Lucie Olbrechts-Tyteca developed to demonstrate how the interaction of differently valued concepts can construct new meaning. My dissertation expands the range of dissociation by applying it specifically to visual contexts and using it to critique visual arguments in a series of historical moments when political, religious, and economic factors cause one form of media to be valued over the other: Byzantine Iconoclasm, the late medieval period, the 1950’s advertising boom, and the modern digital age. In each of these periods, I argue that dissociation reveals how the privileged medium can shape an entire multimodal argument. I conclude with a discussion of dissociative multimodal pedagogy, applying dissociation to the multimodal composition classroom.
39

Wylie, Judith W. "Effects of prior knowledge and text structure on text memory." Thesis, Queen's University Belfast, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359132.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Read, Ian Harvey. "Approaches to prosody prediction for text-to-text speech synthesis." Thesis, University of East Anglia, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.436699.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Gullick, Mark William. "The body of the text; the text of the body." Thesis, University of Sussex, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.297302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Savkov, Aleksandar Dimitrov. "Deciphering clinical text : concept recognition in primary care text notes." Thesis, University of Sussex, 2017. http://sro.sussex.ac.uk/id/eprint/68232/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Electronic patient records, containing data about the health and care of a patient, are a valuable source of information for longitudinal clinical studies. The General Practice Research Database (GPRD) has collected patient records from UK primary care practices since the late 1980s. These records contain both structured data (in the form of codes and numeric values) and free text notes. While the structured data have been used extensively in clinical studies, there are significant practical obstacles in extracting information from the free text notes. The main obstacles are data access restrictions, due to the presence of sensitive information, and the specific language of medical practitioners, which renders standard language processing tools ineffective. The aim of this research is to investigate approaches for computer analysis of free text notes. The research involved designing a primary care text corpus (the Harvey Corpus) annotated with syntactic chunks and clinically-relevant semantic entities, developing a statistical chunking model, and devising a novel method for applying machine learning for entity recognition based on chunk annotation. The tools produced would facilitate reliable information extraction from primary care patient records, needed for the development of clinically-related research. The three medical concept types targeted in this thesis could contribute to epidemiological studies by enhancing the detection of co-morbidities, and better analysing the descriptions of patient experiences and treatments. The main contributions of the research reported in this thesis are: guidelines for chunk and concept annotation of clinical text, an approach to maximising agreement between human annotators, the Harvey Corpus, a method for using a standard part-of-speech tagging model in clinical text chunking, and a novel approach to recognising clinically relevant medical concepts.
43

Samec, Matěj. "Text jako komponent divadelní inscenace a text jako východisko interpretace." Master's thesis, Akademie múzických umění v Praze. Divadelní fakulta AMU. Knihovna, 2009. http://www.nusl.cz/ntk/nusl-79143.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The author proposes to define the concept of creating theater-scenario. He understands theater-scenario writing as a specific form of dramaturgy and the work of dramaturgist. Based on the analyze of his own experiences he tries to denominate two basic forms of theater-scenario writing. The first is theater-scenario-writing in the interpretative theater, the second is theater-scenario-writing in the non-interpretative (authorial) theatre. He employs onself in searching their common factor. In order to clarify the chosen problem, he uses examples of work on concrete ? dramatic, prozaic and ?a priori non-art? ? texts, which was fundamental in the process of creating scenarios of five very different inscenations: Shakespeares Titus Andronicus, dramatization of a novel, scenic collage and two authorial projectes.
44

Brifkany, Jan, and Yasini Anass El. "Text Recognition in Natural Images : A study in Text Detection." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In recent years, a surge in computer vision methods and solutions has been developed to solve the computer vision problem. By combining different methods from different areas of computer vision, computer scientists have been able to develop more advanced and sophisticated models to solve these problems. This report will cover two categories, text detection and text recognition. These areas will be defined, described, and analyzed in the result and discussion chapter. This report will cover an exciting and challenging topic, text recognition in natural images. It set out to assess the improvement of OCR accuracy after three image segmentation methods have been applied to images. The methods used are Maximally stable extremal regions and geometric filtering based on geometric properties. The result showed that the accuracy of OCR with segmentation methods had an overall better accuracy when compared to OCR without segmentation methods. Also, it was shown that images with horizontal text orientation had better accuracy when applying OCR with segmentation methods compared to images with multi-oriented text orientation.
Under de senaste åren har en ökning av datorseende metoder och lösningar utvecklats för att lösa datorseende problemet. Genom att kombinera olika metoder från olika områden av datorseende har datavetare kunnat utveckla mer avancerade och komplexa modeller för att lösa dessa problem. Denna rapport kommer att omfatta två kategorier, textidentifiering och textigenkänning. Dessa områden kommer att definieras, beskrivas och analyseras i resultat- och diskussionskapitlet. Denna rapport kommer att omfatta ett mycket intressant och utmanande ämne, textigenkänning i naturliga bilder. Rapporten syftar till att bedöma förbättringen av OCR-resultatet efter det att tre bildsegmenteringsmetoder har tillämpats på bilder. Metoderna som har använts är ” Maximally stable extremal regions” och geometrisk filtrering baserad på geometriska egenskaper. Resultatet visade att hos OCR med segmenteringsmetoder hade en övergripande bättre resultat jämfört med OCR utan segmenteringsmetoder. Det visades också att bilder med horisontell textorientering hade bättre noggrannhet vid tillämpning av OCR med segmenteringsmetoder jämfört med bilder med flerorienterad textorientering.
45

Richardson, Andrew. "Acting the Absurd: Physical Theatre for Text/Text for Devising." VCU Scholars Compass, 2015. http://scholarscompass.vcu.edu/etd/3744.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper considers two purposes for actor training—textual interpretation and devising original works—through the teaching of a class based on contemporary theatrical clown and physical theatre exercises which are then applied to Samuel Beckett’s Waiting for Godot. Devised work can be used to interpret a script, and a script can be used as a jumping-off point to devise new works. Beginning with an explanation of the teaching methods for the class, the paper then gives a background of clowns who performed in Beckett’s plays, and analyzes various productions' use of games to enliven text. Exercises from the class are used as examples of exploring the uncovering of clown personas and the application of games to both Beckett scene-work and invented theatre pieces. The students’ final performances are examined to demonstrate the effectiveness of the classwork, confirming that textual interpretation and devising are complementary instead of opposing practices.
46

Garcia, Constantino Matias. "On the use of text classification methods for text summarisation." Thesis, University of Liverpool, 2013. http://livrepository.liverpool.ac.uk/12957/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis describes research work undertaken in the fields of text and questionnaire mining. More specifically, the research work is directed at the use of text classification techniques for the purpose of summarising the free text part of questionnaires. In this thesis text summarisation is conceived of as a form of text classification in that the classes assigned to text documents can be viewed as an indication (summarisation) of the main ideas of the original free text but in a coherent and reduced form. The reason for considering this type of summary is because summarising unstructured free text, such as that found in questionnaires, is not deemed to be effective using conventional text summarisation techniques. Four approaches are described in the context of the classification summarisation of free text from different sources, focused on the free text part of questionnaires. The first approach considers the use of standard classification techniques for text summarisation and was motivated by the desire to establish a benchmark with which the more specialised summarisation classification techniques presented later in this thesis could be compared. The second approach, called Classifier Generation Using Secondary Data (CGUSD), addresses the case when the available data is not considered sufficient for training purposes (or possibly because no data is available at all). The third approach, called Semi-Automated Rule Summarisation Extraction Tool (SARSET), presents a semi-automated classification technique to support document summarisation classification in which there is more involvement by the domain experts in the classifier generation process, the idea was that this might serve to produce more effective summaries. The fourth is a hierarchical summarisation classification approach which assumes that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. For evaluation purposes three types of text were considered: (i) questionnaire free text, (ii) text from medical abstracts and (iii) text from news stories.
47

Romsdorfer, Harald. "Polyglot text to speech synthesis text analysis & prosody control." Aachen Shaker, 2009. http://d-nb.info/993448836/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Miloš, Roman. "Metody shlukování textových dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Clustering of text data is one of tasks of text mining. It divides documents into the different categories that are based on their similarities. These categories help to easily search in the documents. This thesis describes the current methods that are used for the text document clustering. From these methods we chose Simultaneous keyword identification and clustering of text documents (SKWIC). It should achieve better results than the standard clustering algorithms such as k-means. There is designed and implemented an application for this algorithm. In the end, we compare SKWIC with a k-means algorithm.
49

Ferre, Ricart Ernest. "Text i Tectònica." Doctoral thesis, Universitat Internacional de Catalunya, 2012. http://hdl.handle.net/10803/83922.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
La tesi Text i Tectònica, consta de tres parts. En la primera [capítol 1] s'ofereix un marc teòric sintètic sobre la necessitat de la concepció holística o globalitzant de la percepció de l’arquitectura com a necessitat de proposar un canvi cultural. La segona part [capítols 2 i 3] comprèn el desenvolupament de la recerca empírica realitzada mitjançant el mètode comparatiu especulatiu de l’obra escrita de Le Corbusier, en dos texts, Vers une Architecture (1923), i UNITÉ (1948). Els resultats que s'han obtingut s’han organitzat en conceptes arquitectònics, sota una estructura iconostàsica (en retaule) que el mateix Le Corbusier ja va preveure en aquell període . En la tercera part [capítol 4], es pretén validar l’estructura iconòstàsica de Le Corbusier com a mètode d’aprenentatge artístic o arquitectònic. La primera validació ve de la teoria del llenguatge, la segona de la teoria poètica, i l’última del camp filosòfic o també anomenat arquitectònica del pensament. Finalment, en el [capítol 5] l'autor fa una proposta concreta per endegar un procés global on les conclusions de la recerca i la metodologia validada d'acord amb el tema estudiat se’ns mostren com un bri d’aire d’aquesta possibilitat de percepció.
La tesis Text i Tectònica consta de tres partes. En la primera [capítulo 1] se ofrece un marco teórico sintético sobre la necesidad de la concepción holística o globalizante de la percepción de la arquitectura como necesidad de proponer un cambio cultural. La segunda parte [capítulos 2 y 3] comprende el desarrollo de la investigación empírica realizada mediante el método comparativo especulativo de la obra escrita de Le Corbusier, en dos textos, Vers une Architecture (1923), y UNITÉ (1948). Los resultados obtenidos se han organizado en conceptos arquitectónicos, bajo una estructura iconostàsica (en retablo) que el propio Le Corbusier ya previó en ese período. En la tercera parte [capítulo 4], se pretende validar la estructura iconòstàsica de Le Corbusier como método de aprendizaje artístico o arquitectónico. La primera validación viene de la teoría del lenguaje, la segunda de la teoría poética, y la última del campo filosófico o también llamado arquitectónica filosófica. Finalmente, en el [capítulo 5] el autor hace una propuesta concreta para iniciar un proceso global donde las conclusiones de la investigación y la metodología validada de acuerdo con el tema estudiado se nos muestran como un germen de cambio de esta posibilidad de percepción.
The thesis Text Tectonics consists of three parts. The first one [chapter 1] provides a synthetic theoretical framework about the need for the holistic or globalizing perception of architecture as necessary to propose a cultural change. The second part [chapters 2 & 3] includes the development of empirical research conducted by the comparative and speculative method of the written work of Le Corbusier, in two texts, Vers une Architecture (1923), and UNITE (1948). The results are organized into architectural concepts under an iconostasis structure (in altarpiece) that Le Corbusier foresaw himself in that period. In the third part [chapter 4], the main objective is to validate the structure of Le Corbusier iconostasis as an artistic or architectonic learning method. The first validation comes from the theory of language, the second of poetic theory, and the last one from philosophy, also known as architectonic philosophy. Finally in [chapter 5] the author makes a concrete proposal to launch a comprehensive global process where the conclusions of the research and the validated methodology according to the subject studied are shown as a seed of possibility of changing this perception
50

Næss, Arild Brandrud. "Bayesian Text Categorization." Thesis, Norwegian University of Science and Technology, Department of Mathematical Sciences, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9665.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

Natural language processing is an interdisciplinary field of research which studies the problems and possibilities of automated generation and understanding of natural human languages. Text categorization is a central subfield of natural language processing. Automatically assigning categories to digital texts has a wide range of applications in today’s information society—from filtering spam to creating web hierarchies and digital newspaper archives. It is a discipline that lends itself more naturally to machine learning than to knowledge engineering; statistical approaches to text categorization are therefore a promising field of inquiry. We provide a survey of the state of the art in text categorization, presenting the most widespread methods in use, and placing particular emphasis on support vector machines—an optimization algorithm that has emerged as the benchmark method in text categorization in the past ten years. We then turn our attention to Bayesian logistic regression, a fairly new, and largely unstudied method in text categorization. We see how this method has certain similarities to the support vector machine method, but also differs from it in crucial respects. Notably, Bayesian logistic regression provides us with a statistical framework. It can be claimed to be more modular, in the sense that it is more open to modifications and supplementations by other statistical methods; whereas the support vector machine method remains more of a black box. We present results of thorough testing of the BBR toolkit for Bayesian logistic regression on three separate data sets. We demonstrate which of BBR’s parameters are of importance; and we show that its results compare favorably to those of the SVMli ght toolkit for support vector machines. We also present two extensions to the BBR toolkit. One attempts to incorporate domain knowledge by way of the prior probability distributions of single words; the other tries to make use of uncategorized documents to boost learning accuracy.

To the bibliography