Дисертації: "Texte structuré"

1

Fan, Huihui. "Text Generation with and without Retrieval." Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0164.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Tous les jours, nous écrivons --- qu'il s'agisse d'envoyer un texte rapide à votre mère ou de rédiger un article scientifique tel que cette thèse. Les logiciels modernes de messagerie instantanée suggèrent souvent le mot à écrire ensuite, les courriers électroniques peuvent être lancés à l'aide d'un autocomposeur et les rédactions sont améliorées grâce à des suggestions de la machine. Ces technologies sont le fruit d'années de recherche sur la génération de texte, un domaine du traitement du langage naturel dont l'objectif est de produire automatiquement un langage naturel fluide et lisible par l'homme. À petite échelle, les systèmes de génération de texte peuvent générer des mots ou des phrases isolés, mais leurs applications vont bien au-delà. Par exemple, les systèmes de résumé, de dialogue et même la rédaction d'articles entiers de Wikipédia reposent sur la technologie fondamentale de génération de texte. La production d'un langage naturel fluide, précis et utile est confrontée à de nombreux défis. Les progrès récents en matière de génération de texte, qui s'appuient principalement sur l'apprentissage d'architectures de réseaux neuronaux sur de grands ensembles de données, ont considérablement amélioré la lisibilité de surface du texte généré par la machine. Cependant, les systèmes actuels nécessitent des améliorations sur de nombreux axes, notamment la génération de textes autres que l'anglais et la rédaction de textes de plus en plus longs. Bien que le domaine ait connu des progrès rapides, la recherche s'est surtout concentrée sur la langue anglaise, où des ensembles de données d'entraînement et d'évaluation à grande échelle pour diverses tâches sont facilement disponibles. Néanmoins, les applications allant de l'autocorrection à l'autocomposition de texte devraient être disponibles universellement. Après tout, la majorité de la population mondiale n'écrit pas en anglais. Dans ce travail, nous créons des systèmes de génération de texte pour diverses tâches avec la capacité d'incorporer des langues autres que l'anglais, soit sous forme d'algorithmes qui s'étendent facilement à de nouvelles langues. Au-delà de nos travaux sur la génération de textes multilingues, nous nous concentrons sur un élément essentiel des systèmes de génération : la connaissance. Pour bien écrire, il faut d'abord savoir quoi écrire. Ce concept de connaissance est incroyablement important dans les systèmes de génération de texte. Par exemple, la rédaction automatique d'un article complet sur Wikipédia nécessite une recherche approfondie sur le sujet de l'article. L'instinct de recherche est souvent intuitif --- il y a quelques décennies, les gens se seraient rendus dans une bibliothèque, remplacés aujourd'hui par les informations disponibles sur le World Wide Web. Cependant, pour les systèmes automatisés, la question n'est pas seulement de savoir quelles connaissances utiliser pour générer du texte, mais aussi comment récupérer ces connaissances et les utiliser au mieux pour atteindre l'objectif de communication visé. Nous relevons le défi de la génération de texte basée sur la récupération. Nous présentons plusieurs techniques permettant d'identifier les connaissances pertinentes à différentes échelles : des connaissances locales disponibles dans un paragraphe à l'identification de l'aiguille dans la botte de foin à l'échelle du web complet, en passant par le passage au crible de Wikipedia. Nous décrivons des architectures de réseaux neuronaux capables d'effectuer efficacement des recherches à grande échelle, en utilisant des mécanismes de précalcul et de mise en cache. Enfin, nous utilisons ces architectures dans des tâches nouvelles, beaucoup plus difficiles, qui repoussent les limites des modèles de génération de texte qui fonctionnent bien aujourd'hui : des tâches qui nécessitent des connaissances, mais qui exigent également que les modèles produisent des résultats longs et structurés en langage naturel
Every day we write --- from sending your mother a quick text to drafting a scientific article such as this thesis. The writing we do often goes hand-in-hand with automated assistance. For example, modern instant messaging software often suggests what word to write next, emails can be started with an autocomposer, and essays are improved with machine-suggested edits. These technologies are powered by years of research on text generation, a natural language processing field with the goal of automatically producing fluent, human-readable natural language. At a small scale, text generation systems can generate individual words or sentences, but have wide-reaching applications beyond that. For instance, systems for summarization, dialogue, and even the writing of entire Wikipedia articles are grounded in foundational text generation technology.Producing fluent, accurate, and useful natural language faces numerous challenges. Recent advances in text generation, principally leveraging training neural network architectures on large datasets, have significantly improved the surface-level readability of machine-generated text. However, current systems necessitate improvement along numerous axes, including generation beyond English and writing increasingly longer texts. While the field has seen rapid progress, much research focus has been directed towards the English language, where large-scale training and evaluation datasets for various tasks are readily available. Nevertheless, applications from autocorrect to autocomposition of text should be available universally. After all, by population, the majority of the world does not write in English. In this work, we create text generation systems for various tasks with the capability of incorporating languages beyond English, either as algorithms that easily extend to new languages or multilingual models encompassing up to 20 languages in one model.Beyond our work in multilingual text generation, we focus on a critical piece of generation systems: knowledge. A pre-requisite to writing well is knowing what to write. This concept of knowledge is incredibly important in text generation systems. For example, automatically writing an entire Wikipedia article requires extensive research on that article topic. The instinct to research is often intuitive --- decades ago people would have gone to a library, replaced now by the information available on the World Wide Web. However, for automated systems, the question is not only what knowledge to use to generate text, but also how to retrieve that knowledge and best utilize it to achieve the intended communication goal.We face the challenge of retrieval-based text generation. We present several techniques for identifying relevant knowledge at different scales: from local knowledge available in a paragraph to sifting through Wikipedia, and finally identifying the needle-in-the-haystack on the scale of the full web. We describe neural network architectures that can perform large-scale retrieval efficiently, utilizing pre-computation and caching mechanisms. Beyond how to retrieve knowledge, we further investigate the form the knowledge should take --- from natural language such as Wikipedia articles or text on the web to structured inputs in the form of knowledge graphs. Finally, we utilize these architectures in novel, much more challenging tasks that push the boundaries of where text generation models work well today: tasks that necessitate knowledge but also require models to produce long, structured natural language output, such as answering complex questions or writing full Wikipedia articles

2

Barth, Elaine Maria Luz. "The effects of text structure instruction on efl reader's understanding of expository texts." reponame:Repositório Institucional da UFSC, 1990. https://repositorio.ufsc.br/xmlui/handle/123456789/157653.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Dissertação (mestrado) - Universidade Federal de Santa Catarina. Centro de Comunicação e Expressão
Made available in DSpace on 2016-01-08T16:51:35Z (GMT). No. of bitstreams: 1 80067.pdf: 4788847 bytes, checksum: e2967ec153e31fb0d4401ad3f98eadc2 (MD5) Previous issue date: 1990

3

Carter-Thomas, Shirley. "Texte et contexte : pour une approche fonctionnelle et empirique." Habilitation à diriger des recherches, Université de la Sorbonne nouvelle - Paris III, 2009. http://tel.archives-ouvertes.fr/tel-00482108.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Textes et contextes sont irrévocablement imbriqués. Si dans mes travaux j'accorde une place importante à la notion de texte et aux phénomènes de cohésion et de cohérence textuelles, il est tout aussi important pour moi de prendre en compte les contextes énonciatifs. Les formes textuelles examinées sont mises en relation avec des situations d'énonciation précises. Il est surtout question de l'interaction entre texte et contexte et de leur relation d'interdépendance. Mon objectif est d'examiner l'impact de certains choix linguistiques sur l'interprétation du texte en cours et en même temps d'évaluer l'influence des facteurs contextuels sur les choix qui sont faits.

4

LUC, CHRISTOPHE. "Representation et composition des structures visuelles et rhetoriques du texte. Approche pour la generation de textes formates." Toulouse 3, 2000. http://www.theses.fr/2000TOU30086.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Le travail presente dans cette these s'inscrit dans le cadre de la representation des structures textuelles. Nous nous sommes focalise sur l'etude conjointe de deux types de structures : la structure rhetorique (a travers la theorie des structures rhetoriques - rst) et la structure visuelle (a travers l'architecture de texte). La these s'articule autour de deux points : - l'etude d'un objet textuel particulier : l'enumeration ; - la generation de texte. Premierement, concernant l'etude sur les enumerations, nous caracterisons certains phenomenes specifiques a ces objets et nous exposons le concept d'enumeration non-parallele (enumeration dont les differents constituants - les items - n'ont pas la meme fonction ou la meme forme visuelle au sein de l'enumeration). Nous montrons la necessite d'une coexistence des deux modeles pour representer de maniere convenable les structures enumeratives et nous proposons une strategie generale de composition de ces modeles s'appliquant a un grand nombre de phenomenes textuels (dont les enumerations non-paralleles). Deuxiemement, concernant la generation de texte, nous etudions ensuite les relations entre les modeles de la rst, de l'architecture textuelle et des connaissances lexico-syntaxiques sur le texte (obtenues par un processus de decomposition d'un texte en phrases elementaires - au sens de z. Harris). Ce travail est realise sur des textes specifiques (regles de jeu) dont le materiel textuel a ete prealablement controle. Sur ces bases, le systeme de generation de texte presente propose de determiner et d'organiser, a l'aide d'un processus de planification, de facon coordonnee, les trois types de connaissances (rhetoriques, architecturales et lexico-syntaxiques) en utilisant des strategies imbriquees. Le systeme est concu sur des phenomenes textuels specifiques comme les textes a consignes (regles de jeu) et certaines enumerations.

5

Haselton, Curt B. Deierlein Gregory G. "Assessing seismic collapse safety of modern reinforced concrete moment-frame buildings." Berkeley, Calif. : Pacific Earthquake Engineering Research Center, 2008. http://nisee.berkeley.edu/elibrary/Text/200803261.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Saint-Germain, Isabelle. "Le passage de l'article scientifique au texte vulgarisé analyse de la structure, du contenu et de la rhétorique des textes." Mémoire, Université de Sherbrooke, 2004. http://savoirs.usherbrooke.ca/handle/11143/2361.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Les différents contextes de diffusion et de présentation d'un article scientifique et d'un article de vulgarisation scientifique nous ont amenée, dans le cadre de ce mémoire, à nous pencher sur le passage du texte scientifique au texte vulgarisé. Plus précisément, nous nous sommes demandée quelles sont les transformations, tant dans la structure des textes que dans leur contenu, et quels sont les effets rhétoriques que le processus de vulgarisation peut entraîner. Les questions Qui? Quoi? Où? Quand? Pourquoi? qu'on pose habituellement trouvent facilement leur réponse, mais afin de comprendre les transformations subies par le texte scientifique pour trouver sa forme vulgarisée, nous nous sommes principalement arrêtée au Comment ?, à savoir, comment les textes sont-ils rédigés dans les deux cas. Nous avons donc comparé un texte scientifique et un texte vulgarisé traitant exactement du même sujet et rédigés par le même auteur, de manière à sous-peser leurs caractéristiques globales."--Résumé abrégé par UMI.

7

Saint-Germain, Isabelle. "Le passage de l'article scientifique au texte vulgarisé : analyse de la structure, du contenu et de la rhétorique des textes." Sherbrooke : Université de Sherbrooke, 2004.

Знайти повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Hsaio-Hui, Wu. "The effects of text structure on comprehending expository texts by EFL vocational university students in Taiwan." Thesis, Queen's University Belfast, 2016. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.707231.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The present study aimed to investigate the impact of explicit instruction in the text structure strategy (TSS) of expository discourse on reading comprehension among EFL vocational university students in Taiwan. Ninety-seven non-English major sophomores from two classes participated in this study. These two classes were randomly assigned to the intervention group receiving the TSS instruction and the control group with the traditional instruction. The study also explored the relationship between the students’ identification and employment of rhetorical organisation and the students’ performance in recall protocols. Whether rhetorically different organisations of expository prose would influence the students’ reading comprehension were under exploration as well. What is more, it was examined whether the students’ English reading proficiency level, self-perceived attitudes towards English, or gender would affect their reading comprehension by comparing the intervention group who received the TSS instruction with the control group who received the conventional instruction. In addition, the students’ perception of the text difficulty, differing organisational structures, forms of measurement, and their actual use of the TSS was further scrutinised. The results from the quantitative data indicated that the intervention group receiving the TSS instruction significantly surpassed the control group, who received the conventional instruction, in their written recalls of expository text. It demonstrated the usefulness of directly teaching the TSS in improving the students’ reading comprehension. Considering the qualitative findings, it was revealed that good readers employ the text structure strategy whereas poor readers adopted the default list strategy when approaching text. In addition, it was found that recall protocols may be a better measurement to gauge readers’ actual reading comprehension in place of multiple-choice questions.

9

Fell, Michael. "Traitement automatique des langues pour la recherche d'information musicale : analyse profonde de la structure et du contenu des paroles de chansons." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4017.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Les applications en Recherche d’Information Musicale et en musicologie computationnelle reposent traditionnellementsur des fonctionnalités extraites du contenu musical sous forme audio, mais ignorent la plupart du temps les paroles des chansons. Plus récemment, des améliorations dans des domaines tels que la recommandation de musique ont été apportées en tenant compte des métadonnées externes liées à la chanson. Dans cette thèse, nous soutenons que l’extraction des connaissances à partir des paroles des chansons est la prochaine étape pour améliorer l’expérience de l’utilisateur lors de l’interaction avec la musique. Pour extraire des connaissances de vastes quantités de paroles de chansons, nous montrons pour différents aspects textuels (leur structure, leur contenu et leur perception) comment les méthodes de Traitement Automatique des Langues peuvent être adaptées et appliquées avec succès aux paroles. Pour l’aspect structurel des paroles, nous en dérivons une description structurelle en introduisant un modèle qui segmente efficacement les paroles en leurs partiescaractéristiques (par exemple, intro, couplet, refrain). Puis, nous représentons le contenu des paroles en résumantles paroles d’une manière qui respecte la structure caractéristique des paroles. Enfin, sur la perception des paroles,nous étudions le problème de la détection de contenu explicite dans un texte de chanson. Cette tâche s’est avèree très difficile et nous montrons que la difficulté provienten partie de la nature subjective de la perception des paroles d’une manière ou d’une autre selon le contexte. De plus, nous abordons un autre problème de perception des paroles en présentant nos résultats préliminaires sur la reconnaissance des émotions. L’un des résultats de cette thèse a été de créer un corpus annoté, le WASABI Song Corpus, un ensemble de données de deux millions de chansons avec des annotations de paroles TAL à différents niveaux
Applications in Music Information Retrieval and Computational Musicology have traditionally relied on features extracted from the music content in the form of audio, but mostly ignored the song lyrics. More recently, improvements in fields such as music recommendation have been made by taking into account external metadata related to the song. In this thesis, we argue that extracting knowledge from the song lyrics is the next step to improve the user’s experience when interacting with music. To extract knowledge from vast amounts of song lyrics, we show for different textual aspects (their structure, content and perception) how Natural Language Processing methods can be adapted and successfully applied to lyrics. For the structuralaspect of lyrics, we derive a structural description of it by introducing a model that efficiently segments the lyricsinto its characteristic parts (e.g. intro, verse, chorus). In a second stage, we represent the content of lyrics by meansof summarizing the lyrics in a way that respects the characteristic lyrics structure. Finally, on the perception of lyricswe investigate the problem of detecting explicit content in a song text. This task proves to be very hard and we showthat the difficulty partially arises from the subjective nature of perceiving lyrics in one way or another depending onthe context. Furthermore, we touch on another problem of lyrics perception by presenting our preliminary resultson Emotion Recognition. As a result, during the course of this thesis we have created the annotated WASABI SongCorpus, a dataset of two million songs with NLP lyrics annotations on various levels

10

O'Malley, Claire E. "Structure and access : the role of structural factors in text comprehension and information." Thesis, University of Leeds, 1985. http://etheses.whiterose.ac.uk/322/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In this thesis it is argued that structural factors play an important role in facilitating the access, comprehension, and recall of textual information, especially when the content of the material is unfamiliar to the reader. A study was made of the effects of manipulating text structure, familiarity of subjects with the text type, familiarity with the content, and instructions given to subjects, on comprehending and recalling information from scientific research reports. The results show that subjects familiar with the text type are able to make use of structure as an encoding strategy, and that the use of this structural strategy improves comprehension and recall when the content is unfamiliar. The study suggests that teaching readers to make use of structure in processing text can facilitate comprehension and recall. These results provide support for previous theories concerning the role of text structure, most of which has focused on narratives, to the neglect of research on expository prose. It is argued that some of the problems involved in the research using narratives, in particular, the problem of the lack of distinction between structural factors and more general knowledge of the content, may be obviated by research with other text types, such as the one used in this study. It is also argued that users of computer-based documentation systems need similar kinds of structural cues for accessing online information as they do for offline text comprehension and retrieval, and that the efficacy and type of such structural cues depends on several factors, such as the task requirement, as well the level of experience of the user. A second study examined the patterns of use of an online documentation system, and showed that users need different forms of organisation of the information as 'access structures, depending on different task requirements. Finally, proposals are made for improving the design of online documentation systems and for conducting future research into the needs of users of such systems.

11

Dzunic, Zoran Ph D. Massachusetts Institute of Technology. "Text structure-aware classification." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53315.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 73-76).
Bag-of-words representations are used in many NLP applications, such as text classification and sentiment analysis. These representations ignore relations across different sentences in a text and disregard the underlying structure of documents. In this work, we present a method for text classification that takes into account document structure and only considers segments that contain information relevant for a classification task. In contrast to the previous work, which assumes that relevance annotation is given, we perform the relevance prediction in an unsupervised fashion. We develop a Conditional Bayesian Network model that incorporates relevance as a hidden variable of a target classifier. Relevance and label predictions are performed jointly, optimizing the relevance component for the best result of the target classifier. Our work demonstrates that incorporating structural information in document analysis yields significant performance gains over bag-of-words approaches on some NLP tasks.
by Zoran Dzunic.
S.M.

12

Ouentchist, Dogny Elysee. "Rôles et fonctionnement de structures signifiantes dans la modalisation de l'affichage à Abidjan." Thesis, Limoges, 2019. http://www.theses.fr/2019LIMO0056.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

A travers cette thèse, nous analysons la situation «scène» et la situation «stratégie» de l’affiche dans la ville d’Abidjan. Elle montre comment différentes «structures signifiantes» (le support, l’espace et le temps, et les situations de communication) interagissent avec l’icono-texte dans la prédication de l’affiche. L’hypothèse principale suppose une double manipulation qui «met en jeux» l’affiche et la ville. Nous avons montré, à travers les théories sémiotique et pragmatique, que d’un côté l’affiche dans la ville vit ou survit en captant et en exprimant les besoins des populations. De l’autre côté, la ville exerce un contrôle politique sur l’affiche, en lui autorisant ou en lui défendant certains espaces. Les interactions qui s’engagent entre la ville et l’affiche permettent d’analyser le contexte social ivoirien. En effet, l’affichage à Abidjan montre une diversité de grandeurs: diversité des énonciateurs, diversité des discours, diversité des supports, diversité des propositions, diversité des destinateurs, diversité des espaces d'affichage.Aussi, l’intensité des actes d’affichage dénote d’un anarchisme révélateur de profondes tensions sociales: l’hégémonie des «petits métiers» et leur sophistication, de l’obscénité dans les écrits urbains, de l’extraversion économique et pose le problème de la laïcité
Through this thesis, we analyse the "stage" situation and the "strategy" situation of the poster in the city of Abidjan. It shows how different "meaningful structures" (support, space and time, and communication situations) interact with icono-text in poster preaching. The main hypothesis assumes a double manipulation that "puts into play" the poster and the city. We have shown, through semiotic and pragmatic theories that on the one hand the poster in the city lives or survives by capturing and expressing the needs of the populations. On the other hand, the city exercises a political control over the poster, allowing or defending some spaces. The interactions that take place between the city and the poster make it possible to analyze the Ivorian social context. Indeed, the posters in Abidjan shows a diversity of sizes: diversity of enunciators, diversity of speech, diversity of media, diversity of proposals, diversity of senders, and diversity of display spaces. Also, the intensity of the posters shows an anarchism revealing deep social tensions: the hegemony of "small trades" and their sophistication, obscenity in urban writings, economic extroversion and triggers the issue of laicity

13

McDonald, Daniel Merrill. "Combining Text Structure and Meaning to Support Text Mining." Diss., The University of Arizona, 2006. http://hdl.handle.net/10150/194015.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Text mining methods strive to make unstructured text more useful for decision making. As part of the mining process, language is processed prior to analysis. Processing techniques have often focused primarily on either text structure or text meaning in preparing documents for analysis. As approaches have evolved over the years, increases in the use of lexical semantic parsing usually have come at the expense of full syntactic parsing. This work explores the benefits of combining structure and meaning or syntax and lexical semantics to support the text mining process.Chapter two presents the Arizona Summarizer, which includes several processing approaches to automatic text summarization. Each approach has varying usage of structural and lexical semantic information. The usefulness of the different summaries is evaluated in the finding stage of the text mining process. The summary produced using structural and lexical semantic information outperforms all others in the browse task. Chapter three presents the Arizona Relation Parser, a system for extracting relations from medical texts. The system is a grammar-based system that combines syntax and lexical semantic information in one grammar for relation extraction. The relation parser attempts to capitalize on the high precision performance of semantic systems and the good coverage of the syntax-based systems. The parser performs in line with the top reported systems in the literature. Chapter four presents the Arizona Entity Finder, a system for extracting named entities from text. The system greatly expands on the combination grammar approach from the relation parser. Each tag is given a semantic and syntactic component and placed in a tag hierarchy. Over 10,000 tags exist in the hierarchy. The system is tested on multiple domains and is required to extract seven additional types of entities in the second corpus. The entity finder achieves a 90 percent F-measure on the MUC-7 data and an 87 percent F-measure on the Yahoo data where additional entity types were extracted.Together, these three chapters demonstrate that combining text structure and meaning in algorithms to process language has the potential to improve the text mining process. A lexical semantic grammar is effective at recognizing domain-specific entities and language constructs. Syntax information, on the other hand, allows a grammar to generalize its rules when possible. Balancing performance and coverage in light of the world's growing body of unstructured text is important.

14

Clément, Julien. "Algorithmes, mots et textes aléatoires." Habilitation à diriger des recherches, Université de Caen, 2011. http://tel.archives-ouvertes.fr/tel-00913127.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Dans ce mémoire, j'examine différents aspects d'un objet simple mais omniprésent en informatique: la séquence de symboles (appelée selon le contexte mot ou chaîne de caractères). La notion de mot est au carrefour de domaines comme la théorie de l'information et la théorie des langages. S'il est simple, il reste fondamental: nous n'avons, au plus bas niveau, que cela à disposition puisqu'il arrive toujours un moment où une donnée doit être encodée en symboles stockables en mémoire. La quantité d'information croissante de données mise à disposition et qu'on peut stocker, par exemple des génomes d'individus ou des documents numérisés, justifie que les algorithmes et les structures de données qui les manipulent soient optimisés. En conséquence, les besoins d'analyse se font sentir pour guider le choix et la conception des programmes qui manipulent ces données. L'analyse en moyenne est ici particulièrement adaptée puisque les données atteignent une variété et des volumes tellement importants que c'est le cas typique qui traduit le mieux la complexité et non pas le cas le pire. Cela évidemment pose le problème de la modélisation de données qui reste encore très épineux. En effet on souhaite deux choses contradictoires: un modèle au plus près des données, qui traduise vraiment leurs spécificités, mais aussi un modèle permettant de donner des résultats, c'est-à-dire de prédire les performances (et on comprend vite que le modèle doit donc rester relativement simple pour qu'il subsiste un espoir de le traiter!). Les méthodes sont le plus souvent celles de la combinatoire analytique et font appel à un objet mathématique, les séries génératrices, pour mener les analyses à bien.

15

Salson, Mikaël. "Structures d'indexation compressées et dynamiques pour le texte." Rouen, 2010. http://www.theses.fr/2010ROUES042.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Les structures d’indexation compressées (SIC) permettent une recherche très rapide dans de grands textes en utilisant un espace inférieur à ceux-ci. L’apparition des SIC en 2000 a autorisé l’indexation de génomes entiers de mammifères. Nous introduisons une méthode qui met à jour une SIC afin de prendre en compte les modifications du texte indexé. À travers des résultats théoriques et pratiques, nous montrons que notre solution est beaucoup plus rapide que la reconstruction complète de la SIC. Nous proposons aussi une méthode pour la recherche de minimum d’une séquence numérique pour un intervalle donné. Celle-ci est plus économe en espace que les autres méthodes et autorise la mise à jour de la séquence. Enfin, pour rechercher des millions de courtes séquences au sein d’un génome, nous proposons une méthode qui augmente significativement le pourcentage de séquences localisées et permet d’identifier les mutations génétiques, par exemple.

16

Hernandez, Nicolas. "Description et détection automatique de structures de texte." Paris 11, 2004. http://www.theses.fr/2004PA112329.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Les systèmes de recherche d'information ne sont pas adaptés pour une navigation intra-documentaire (résumé dynamique). Or celle-ci est souvent nécessaire pour évaluer la pertinence d'un document. Notre travail se situe dans une perspective de web sémantique. Notre objectif est d'enrichir les documents pour fournir aux systèmes, voire directement à l'utilisateur, des informations de description et d'organisation du contenu des documents. Les informations de nature descriptive concernent d'une part l'identification des expressions thématiques du discours, et d'autre part l'identification du type d'information sémantique ou rhétorique contenu dans une phrase donnée (par exemple la présentation du but de l'auteur, l'énonciation d'une définition, l'exposition d'un résultat, etc. ). L'identification des thèmes implémente deux approches distinctes l'une fondée sur la résolution d'anaphores, la seconde sur la construction de chaînes lexicales. En ce qui concerne l'identification des types d'information des phrases, nous proposons une méthode d'acquisition automatique de marques méta-discursives. L'objectif de détection de l'organisation du discours est envisagé selon deux approches. La première consiste à une analyse globale descendante du texte, en combinant une segmentation par cohésion lexicale, et un repérage de marques linguistiques de type introducteur de cadres (e. G. "En ce qui concerne X, En Corée, D'abord etc. "). La seconde approche vise une détection plus fine de l'organisation du discours en identifiant les relations de dépendance informationnelle entre les phrases (subordination et coordination)
Information Retrieval Systems are not well adapted for text browsing and visualization (dynamic summarization). But this one is always necessary for the user to evaluate the Information Retrieval (IR) systems are not well adapted for text browsing and visualization (dynamic summarization). But this is always necessary for users to evaluate the relevance of a document. Our work follows a Web Semantic perspective. We aim at annotating documents with abstract information about content description and discourse organization in order to create more abilities for IR systems. Descriptive information concerns both topic identification and semantic and rhetorical classification of text extracts (With information such as "Our aim is. . . ", "This paper deals with. . . "). We implement a system to identify topical linguistic expressions based on a robust anaphora system and lexical chains building. We also propose a method in order to automatically acquire meta-discursive material. We perform the detection of the text structure thanks to two complementary approaches. The first one offers a top-down analysis based on the segmentation provided by lexical cohesion and by linguistic markers such as frame introducers. The second one is concerned by local text organization by the detection of informational relations (coordination and subordination) between subsequent sentences

17

Tirkkonen-Condit, Sonja. "Argumentative text structure and translation." Jyväskylä : University of Jyväskylä, 1985. http://catalog.hathitrust.org/api/volumes/oclc/13332106.html.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Lemos, Carolina Lindenberg. "Condições semióticas da repetição." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/8/8139/tde-09062015-111352/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Trazida de áreas diversas dos estudos do homem, a repetição ganha papel central nesta tese de teor semiótico. Trata-se de fenômeno muito presente em todas as ações humanas e, em especial, nos textos. O caráter opcional de certas repetições nos textos traz à baila o problema de sua função, uma vez que, em certos casos, parece agir diretamente sobre o ritmo do conteúdo e o fluxo de entradas e saídas do campo de presença. Esse caráter regulador do ritmo divide a pesquisa em duas questões. De um lado, o efeito rítmico parece apontar para uma estrutura subjacente. Nesse sentido, podemos nos perguntar: qual é a configuração dessa estrutura? De que forma participa a repetição? Ou ainda, qual o seu lugar no esquema semiótico? De outro lado, a repetição parece envolver certa contradição: de que maneira um fenômeno que não traz novidades, apenas a retomada do conhecido, pode, por vezes, criar um efeito de tensão ou surpresa? Para responder a essas perguntas, partimos de uma revisão do papel da repetição em duas áreas vizinhas: a retórica e uma determinada corrente linguística. Essa discussão nos permitiu enxergar insuficiências nessas abordagens que podem ser supridas pela semiótica. Uma vez dentro da perspectiva semiótica, buscamos o lugar ocupado pela repetição, confrontando-a com conceitos como identificação, texto, língua e a própria noção de semiótica. Estabelecida a posição da repetição no texto, passamos a levantar e discutir as condições textuais necessárias para o aparecimento de repetições relevantes. Além do processo de identificação, a noção de saliência, baseada na oposição figura e fundo, revelou-se central para a explicação do fenômeno. Finalmente, a linearidade mostrou-se relevante, o que nos permitiu rediscutir seu estatuto teórico como uma manifestação possível da estrutura sintagmática subjacente. Tendo delineado as condições da repetição, iniciamos a investigação sobre os efeitos um tanto contraditórios que havíamos constatado nas ocorrências repetitivas. Vimos que a repetição pertence à ordem da extensidade ela se conta, não se mede , sendo assim, é instrumento de manifestação de um ritmo do conteúdo que lhe é pressuposto. Nesses termos, a repetição está subordinada a valências intensivas como o andamento e a tonicidade. Para assegurar a pertinência de nossos argumentos, estudamos a repetição no interior de objetos selecionados, nos quais está a serviço da estruturação textual. As análises dos objetos acabaram por evidenciar as relações da repetição com a concepção de aspecto, e três estilos de progressão textual ligados à repetição se confirmaram: o circular, o linear e o espiral. Esse trajeto mostrou-nos em que termos a repetição se liga a uma estrutura subjacente e a manifesta, mas também de que forma essa estrutura não só explica, como gera as variações de ritmo e andamento que se fazem sentir por meio da repetição. As contradições aparentes dos efeitos da repetição se explicam pelas próprias bases epistemológicas da disciplina. O caráter analítico e relacional da semiótica está na base da construção repetitiva, que, sem acrescentar nada de novo, pode levar o enunciatário à tensão, ao clímax e à surpresa.
Emanating from different areas of the human sciences, repetition was given a central role in this thesis of semiotic inclination. It is a widespread phenomenon in all fields of human activity and, particularly, in texts. The optional character of certain repetitions brings about the problem of its function, since, in certain cases, it seems to act directly on the rhythm of the content and the flow of entrances and exits of the phenomenal field. This regulation of the rhythm divides the research into two fronts. On the one hand, the rhythmic effect points to an underlying structure. In that sense, one can ask: what is the configuration of such structure? In what way is repetition part of it? Or even, what is its place in the semiotic model? On the other hand, repetition seems to involve a certain degree of contradiction: in what way can a phenomenon that brings no novelty, only the resumption of the same, sometimes create an effect of tension or surprise? In order to answer these questions, we undertake the revision of the role of repetition in neighboring fields: rhetoric and a specific trend in linguistics. This discussion has allowed us to detect a few insufficiencies in these approaches that may be answered by semiotics. From the semiotic perspective, we have explored the place occupied by repetition, by opposing it to concepts such as identification, text, language and to the notion of semiotics itself. Once the position of repetition in the text is established, we move on to note and discuss the textual conditions necessary to the occurrence of relevant repetitions. In addition to identification, the notion of salience, based on the opposition between figure and ground, revealed itself to be central to the explanation of the phenomenon. Finally, linearity has also proven relevant, which allowed us to re-discuss its theoretical status as one possible manifestation of the underlying syntagmatic structure. Having outlined the conditions for repetition, we have started an investigation into the somewhat contradictory effects we had observed in repetitive incidents. We saw that repetition belongs to the order of the extent it is counted, not measured and, in being so, it is a tool for the manifestation of the rhythm of the content that is presupposed by it. In these terms, repetition is subordinated to the intensive sub-dimensions: tempo and tonicity. To ensure the relevance of our arguments, we studied repetition within some selected objects, where it is made to serve the structuring of the text. Finally, the analysis of these objects shed light on the relations between repetition and the concept of aspect, and three styles of textual progression related to repetition were confirmed: circular, linear and spiraling. This path of investigation has shown us the terms which repetition is tied to and the way in which it manifests an underlying structure. It has also revealed that such structure not only explains but also generates the variations in rhythm and tempo that are felt through repetition. The apparent contradictions of the effects of repetition are explained by the very epistemological bases of the field. The analytical and relational aspects of semiotics are the basis for repetitive construction, which, without adding any new information, may lead the enunciatee to tension, climax and surprise.

19

NUNES, IAN MONTEIRO. "CLUSTERING TEXT STRUCTURED DATA BASED ON TEXT SIMILARITY." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=25796@1.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
O presente trabalho apresenta os resultados que obtivemos com a aplicação de grande número de modelos e algoritmos em um determinado conjunto de experimentos de agrupamento de texto. O objetivo de tais testes é determinar quais são as melhores abordagens para processar as grandes massas de informação geradas pelas crescentes demandas de data quality em diversos setores da economia. O processo de deduplicação foi acelerado pela divisão dos conjuntos de dados em subconjuntos de itens similares. No melhor cenário possível, cada subconjunto tem em si todas as ocorrências duplicadas de cada registro, o que leva o nível de erro na formação de cada grupo a zero. Todavia, foi determinada uma taxa de tolerância intrínseca de 5 porcento após o agrupamento. Os experimentos mostram que o tempo de processamento é significativamente menor e a taxa de acerto é de até 98,92 porcento. A melhor relação entre acurácia e desempenho é obtida pela aplicação do algoritmo K-Means com um modelo baseado em trigramas.
This document reports our findings on a set of text clusterig experiments, where a wide variety of models and algorithms were applied. The objective of these experiments is to investigate which are the most feasible strategies to process large amounts of information in face of the growing demands on data quality in many fields. The process of deduplication was accelerated through the division of the data set into individual subsets of similar items. In the best case scenario, each subset must contain all duplicates of each produced register, mitigating to zero the cluster s errors. It is established, although, a tolerance of 5 percent after the clustering process. The experiments show that the processing time is significantly lower, showing a 98,92 percent precision. The best accuracy/performance relation is achieved with the K-Means Algorithm using a trigram based model.

20

Wylie, Judith W. "Effects of prior knowledge and text structure on text memory." Thesis, Queen's University Belfast, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359132.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Fuchs, Juliana Thiesen. "Rhetorical Structure Theory: limites e possibiliades de representação da organização textual." Universidade do Vale do Rio do Sinos, 2009. http://www.repositorio.jesuita.org.br/handle/UNISINOS/2569.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Made available in DSpace on 2015-03-05T18:11:57Z (GMT). No. of bitstreams: 0 Previous issue date: 12
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Nesta dissertação de mestrado, procuro mostrar a contribuição de determinadas concepções de organização textual para a representação do texto realizada pelo modelo da Rhetorical Structure Theory – RST (Mann; Thompson, 1988). A RST é uma teoria que explica a estrutura textual por meio de um modelo de relações que se estabelecem, recursivamente, entre partes do texto consideradas pelo analista como núcleos e satélites. Porém, apesar de abarcar a coerência retórica relacional, a RST, como teoria, não lida com outras concepções que dêem conta do processo complexo de organização textual. Dessa forma, como modelo, ela representa o texto de forma limitada. Neste trabalho, investigo a possibilidade de a RST ser associada a determinadas concepções de organização textual, como a relação entre texto e contexto e o processo estratégico top-down de formação do texto. Para tanto, realizo uma investigação em duas partes: uma teórica e uma de análise. Na parte teórica, apresento um quadro teórico que embasa as concepções de
In this master’s degree paper work, I aim to show the contribution of some conceptions of textual organization to the text representing process carried out by Rhetorical Structure Theory – RST (Mann; Thompson, 1988). RST is a theory that explains the text structure by postulating a model of relations which recursively hold between parts of text labeled nucleus or satellite by the analyst. However, even accounting for the rhetorical relational coherence, RST, as a theory, doesn’t include other conceptions to account for the complex process of textual organization. Thus, as a model, it produces a limited text representation. In this paper work, I investigate the possibility of associating RST with some conceptions of textual organization, like the relationship between text and context and the top-down strategic process of text construction. To do so, I carry out an investigation in two parts: a theoretical one and an analytical one. In the theoretical part, I show a theoretical framework that supports the conce

22

Lyra, Risto Matti Juhani. "Topical subcategory structure in text classification." Thesis, University of Sussex, 2019. http://sro.sussex.ac.uk/id/eprint/81340/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Data sets with rich topical structure are common in many real world text classification tasks. A single data set often contains a wide variety of topics and, in a typical task, documents belonging to each class are dispersed across many of the topics. Often, a complex relationship exists between the topic a document discusses and the class label: positive or negative sentiment is expressed in documents from many different topics, but knowing the topic does not necessarily help in determining the sentiment label. We know from tasks such as Domain Adaptation that sentiment is expressed in different ways under different topics. Topical context can in some cases even reverse the sentiment polarity of words: to be sharp is a good quality for knives but bad for singers. This property can be found in many different document classification tasks. Standard document classification algorithms do not account for or take advantage of topical diversity; instead, classifiers are usually trained with the tacit assumption that topical diversity does not play a role. This thesis is focused on the interplay between the topical structure of corpora, how the target labels in a classification task distribute over the topics and how the topical structure can be utilised in building ensemble models for text classification. We show empirically that a dataset with rich topical structure can be problematic for single classifiers, and we develop two novel ensemble models to address the issues. We focus on two document classification tasks: document level sentiment analysis of product reviews and hierarchical categorisation of news text. For each task we develop a novel ensemble method that utilises topic models to address the shortcomings of traditional text classification algorithms. Our contribution is in showing empirically that the class association of document features is topic dependent. We show that using the topical context of documents for building ensembles is beneficial for some tasks, and present two new ensemble models for document classification. We also provide a fresh viewpoint for reasoning about the relationship of class labels, topical categories and document features.

23

Lafourcade, Mathieu. "Lexique et analyse sémantique de textes - structures, acquisitions, calculs, et jeux de mots." Habilitation à diriger des recherches, Université Montpellier II - Sciences et Techniques du Languedoc, 2011. http://tel.archives-ouvertes.fr/tel-00649851.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

L'analyse sémantique de textes nécessite en préalable la construction d'objets relevant de la sémantique lexicale. Les vecteurs d'idées et les réseaux lexicaux semblent de bons candidats et constituent ensemble des structures complémentaires. Toutefois, faut-il encore être capable dans la pratique de les construire. Les vecteurs d'idées peuvent être calculés à partir de corpus de définitions de dictionnaires, de thésaurus ou encore de textes. Ils peuvent se décliner en des vecteurs conceptuels, des vecteurs anonymes ou des vecteurs lexicaux - chaque type présentant un équilibre différent entre précision, couverture et praticité. Quant aux réseaux lexicaux, ils peuvent être acquis efficacement via des jeux, et c'est précisément l'objet du projet JeuxDeMots. L'analyse sémantique peut être abordée par l'analyse thématique, et ainsi servir de moyen de calcul à des vecteurs d'idées (bouclage). Nous pouvons modéliser l'analyse comme un problème d'activation et de propagation. La multiplicité des critères pouvant intervenir dans une analyse sémantique, et la difficulté inhérente à définir une fonction de contrôle satisfaisante, nous amène à explorer l'usage de métaheuristiques bio-inspirées. Plus précisément, nous introduisons un modèle d'analyse par colonies de fourmis artificielles. A partir d'un texte, l'analyse vise a construire un graphe contenant les objets du texte (les mots), des objets identifiés comme pertinents (des syntagmes, des concepts) ainsi que des relations pondérées et typées entre ces objets.

24

Lau, Lai Lai Cubie. "The argument structure of fund-raising texts." HKBU Institutional Repository, 2001. http://repository.hkbu.edu.hk/etd_ra/385.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Hassan, Jawad. "Structured Text Compiler Targeting XML." Thesis, Växjö University, School of Mathematics and Systems Engineering, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-6441.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Young-Lai, Matthew. "Text structure recognition using a region algebra." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/NQ60576.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Bisson, Marie. "Une édition numérique structurée à l’aide de la Text Encoding Initiative des textes montois de dom Thomas Le Roy : établissement critique des textes, recherches sur les sources, présentation littéraire et historique." Caen, 2015. http://www.theses.fr/2015CAEN1029.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Cette thèse a pour objet l’édition des textes de dom Thomas Le Roy sur l’histoire du Mont Saint-Michel. Au XVIIe siècle, ce moine mauriste, a écrit conjointement trois versions de ses recherches sur l'histoire de l'abbaye. Eugène de Robillard de Beaurepaire en a fait une première édition scientifique au XIXe siècle, mais il s’est appuyé sur un seul texte et en a même occulté de nombreux passages. Nous proposons donc une édition critique des trois textes sur l’histoire montoise de dom Thomas Le Roy : une version longue et chronologique (Caen BM, Mancel 195) ; une brève histoire de l’abbaye du Mont Saint-Michel (Paris BNF, Latin 13818) ; une troisième version de 228 pages, thématique (Paris BNF, Français 18950). Nous avons comparés les manuscrits et avons tenté de retrouver les sources que l’historien mauriste avait utilisées. Nous avons choisi de publier le résultat de ce travail sous deux formes, papier et électronique, en faisant apparaître l’autorité de chaque partie de texte, en confrontant les différentes versions entre elles et en analysant le travail de réécriture des sources. En raison du temps imparti pour la thèse, nous avons dû faire des choix dans nos niveaux d'annotation et de discussion critique ; nous avons donc proposé pour le texte le plus long un moindre niveau d'annotation. L’édition numérique utilise la TEI (Text Encoding Language) qui est l’ensemble de recommandations actuellement expérimenté avec le plus de succès, dans le domaine des sciences humaines et sociales, pour décrire le contenu et la structure des documents écrits. Nous proposons en introduction de cette thèse une présentation littéraire, historique et méthodologique de nos travaux
The aim of this thesis is to establish a critical edition of the investigations of dom Thomas Le Roy on the history of the Mont Saint-Michel. In the 17th century, this Maurist monk has written collectively many versions of his work. Eugène de Robillard de Beaurepaire, in the 19th century, has worked on the first scientific edition of this work but has principally relied on one main manuscript and has even missed many passages of the work. We therefore propose a critical edition of the three texts that we possess on the Mont's history of dom Thomas Le Roy : one long and chronological version (Caen BM, Mancel 195) ; one brief history of the abbey of the Mont Saint-Michel (Paris BNF, Latin 13818) ; one version of 228 pages, on the topic (Paris BNF, Français 18950). We have compared and have tried to find the sources that the Maurist has taken into account to write the texts. Rendering the authority of each part of the text, comparing the different versions but also analysing the re-writing work of the sources, we have chosen to publish the result of the work in two different forms : paper and electronic. Due to the limited time of the thesis, the scientific annotation of the longest manuscript could not be completed : the text is thus not paper published but electronically published. The electronic edition is done in the XML (eXtensible Markup Language) language with the help of TEI (Text Encoding Language), a guideline presently experimented with greatest success to describe the content and the structure of the written documents in the Human sciences. We propose in introduction of this thesis a literary, historical and methodological presentation of our work

28

Eler, Marcelo Medeiros. "Uso da técnica de teste estrutural para o teste e monitoração de serviços." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-04092012-141341/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

A computação orientada a serviços propõe o desenvolvimento de software por meio da composição de serviços com os objetivos de aumentar o reúso de software e facilitar a criação de aplicações dinâmicas, flexíveis e com baixo acoplamento. O uso de serviços no desenvolvimento de software só é possível se os desenvolvedores de aplicações (integradores) confiarem na qualidade dos serviços oferecidos por terceiros. Uma forma de aumentar a confiança sobre serviços adquirido de terceiros é a realização de testes. Entretanto, o teste de serviços é difícil porque os testadores ficam limitados a usar técnicas de teste baseadas em especificação por causa da indisponibilidade do código fonte. Nesse contexto, os testadores não podem usufruir dos benefícios de combiná-las com técnicas baseadas em implementação, como a técnica estrutural, por exemplo. Uma abordagem para viabilizar o uso da técnica de teste estrutural no contexto de aplicações baseadas em serviços sem expor o código fonte dos serviços é apresentada. Ela propõe a criação de serviços testáveis, que são serviços com alta testabilidade e que possuem uma interface de teste cujas operações apoiam o teste estrutural. Integradores podem realizar o teste de um serviço testável e obter, sem acessar o código fonte, uma análise de cobertura. Metadados de teste também são fornecidos pelos serviços testáveis para auxiliar integradores na obtenção de uma cobertura estrutural maior. A abordagem também apoia atividades de monitoração ativa de serviços. A abordagem é genérica uma instanciação para apoiar o teste estrutural de serviços e aplicações escritos em Java é apresentada. Estudos de casos e experimentos controlados foram realizados para validar a abordagem instanciada. Os resultados mostram que a abordagem é viável e apresenta bons resultados quando comparada com o uso apenas da técnica funcional
Software oriented computing aims at developing software by the composition of services. It promotes software reuse and the implementation of dynamic, flexible and low coupling applications. Services provide specific business functionalities and are provided as a black-box. The use of services is only possible if the developers of service applications (integrators) trust the third party services. Particularly, testing is one of the solutions to obtain confidence on third party software. However, testers can only use specification based testing techiniques due to unavailability of the source code. In this context, testers cannot use the benefits of combining specification and implementation-based testing techniques. This works aims at proposing an an approach to introduce the structural testing technique in the context of service-based applications, but without revealing the source code. The proposed approach promotes the development of testable services, which are services with high testability and exposes operations through a testing interface to support structural testing. Integrators can test testable services and get, without having access to the source code, a coverage analysis on structural criteria. Test metadata are also provided along with testable services to help integrators on creating more test cases to increase the coverage obtained. The proposed approach is also used to support monitoring activities. The approach is generic and an instantiation is presented to create testable services written in Java. Formal experiments and case studies were conduct to validate the proposed approach and the instantiation. The results provide evidences of the applicability and the benefits of the approach for both testing and monitoring activities when compared to only using the functional approach

29

Karaouza, Efthymia. "Cohesion and text structure in Attic Greek prose." Thesis, University of Birmingham, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442640.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Bouayad-Agha, Nadjet. "The role of document structure in text generation." Thesis, University of Brighton, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.366234.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Fanning, David John. "Schoenberg's monodrama "Erwartung" text, structure and musical language /." Online version, 1985. http://ethos.bl.uk/OrderDetails.do?did=1&uin=uk.bl.ethos.353718.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Mota, Filho Antonio. "Text structure and brazilian university student's writing proficiency." reponame:Repositório Institucional da UFSC, 1989. https://repositorio.ufsc.br/xmlui/handle/123456789/157596.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Dissertação (mestrado) - Universidade Federal de Santa Catarina. Centro de Comunicação e Expressão
Made available in DSpace on 2016-01-08T16:24:50Z (GMT). No. of bitstreams: 1 79187.pdf: 2947212 bytes, checksum: 7538e89c2f805069675390db75fccb0a (MD5) Previous issue date: 1989
Pesquisas empíricas têm demonstrado a importância da organização retórica na compreensão e produção de textos expositivos (ou narrativos). A idéia básica é que a organização retórica subjacente a um dado texto interage com o esquema formal do leitor (seu conhecimento prévio e sua experiência com organização retórica) influenciando na compreensão e produção de textos.

33

Holsgrove, John V. "Structure strategy use in children's comprehension of expository texts." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2011. https://ro.ecu.edu.au/theses/398.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This study reviewed a body of literature largely written between the mid 1970s and 1990s that was concerned with the rhetorical structure of written expository text and its relationship to memory and comprehension. This dissertation follows from an argument that the earlier research often confused memory and comprehension and that it was limited in its attempt to clarify the relationship between text structure and reading comprehension. The current study sought to provide a fuller description of the manner in which schoolchildren of different ages and abilities employ rhetorical structure in the comprehension process. In contrast to the earlier research this study makes a distinction between the top-level structure of a text and the structure of the reader’s meaning. It sought to discover what, if any, was the relationship between the structure of the reader’s comprehension and the top-level structure of the text, the educational stage of the reader, and the reading comprehension ability of the reader. A sample of 229 schoolchildren from Years 5, 7, and 9, and further subdivided by reading ability, was given a task of reading three passages and carrying out an underlining task to identify the seven sentences in each passage that best captured the its overall meaning. The three passages employed were natural passages of text, each approximately 700 words in length, and each with a different top-level structure. Minor adjustments were made in respect of vocabulary and sentence length to match the different age groups within the sample. Each participant’s sentence selections were analysed for a collective structure in an effort to discover any structure employed by the reader in constructing the meaning of the respective text. The effectiveness of structure usage was measured by the degree of coherence captured by the sentence selections. As might be expected, good readers and older children generally performed the task more successfully and effectively than poorer and younger readers. The results indicated, contrary to a common assumption of the earlier research, that the structures employed by the participants reflected two different and distinct categories: content structures which selected information based on association and rhetorical structures based on logical argument. It was subsequently considered that semantic information might be relatively more influential in using content structure whereas syntax might play the more significant role in the use of rhetorical structure. The more able readers generally maximised coherence by combining rhetorical and content structures in the construction of meaning except where a passage was limited to description only. There was a complex relationship between the structure of the text and the structure of the reader’s meaning that reflected a constructivist explanation of reading comprehension. It was found that whilst many children of all ages and ability had a capacity to recognise the various content and rhetorical structures regardless of their relative complexity, that effective use was related to practice. Other factors that might complicate structure strategy use in reading comprehension were identified.

34

Silva, Patricia Andrade da. "Mapas e redes conceituais: uma proposta metodológica para a sua construção a partir de textos." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/81/81132/tde-20092016-105920/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

A elaboração de textos por alunos em resposta a questões dissertativas dentro do contexto escolar parece ser a forma mais convencional de se tentar avaliar o que os alunos sabem. O texto que um indivíduo produz procura refletir de forma aproximada a sua estrutura de conhecimentos sobre determinado tema. A leitura e análise de textos são tarefas que exigem um tempo considerável no dia-a-dia de um professor ou pesquisador e, quando há o interesse em conhecer as ideias mais relevantes sobre determinado tema para um grupo de alunos, a tarefa é ainda mais trabalhosa. O principal objetivo desta pesquisa consiste em desenvolver uma metodologia que utiliza ferramentas computacionais para transformar textos escritos por alunos em estruturas gráficas como mapas e redes de conceitos. A utilidade desta metodologia aparece tanto no contexto da pesquisa em ensino quanto na própria prática docente, já que o produto final de sua aplicação pode permitir estabelecer inferências quanto à estrutura de conhecimentos de um grupo de alunos. A investigação ocorreu a partir de dados coletados em duas disciplinas distintas de cursos de graduação do IQ-USP. Os dados coletados referem-se a produções textuais de 42 estudantes em resposta a uma questão que fornecia alguns conceitos pré-estabelecidos. A partir das respostas dos alunos foram realizados testes: (i) com dois softwares de análise de textos para a quantificação das relações entre conceitos; (ii) para verificar a influência na quantificação das relações entre conceitos partindo-se do texto como foi escrito e das proposições extraídas do mesmo e (iii) para a obtenção de diferentes tipos de estruturas gráficas. A partir dos testes realizados, foi possível concluir que o programa Hamlet® é mais eficiente e prático do que o programa ALA-Reader® para os objetivos da presente pesquisa. Além disso, a matriz gerada pelo Hamlet® para quantificar as relações entre conceitos depende essencialmente da estrutura do texto em questão - texto original ou texto modificado. Os três tipos de estruturas gráficas construídos apresentam diferentes focos, porém, podem ser considerados complementares. As redes V+P se mostraram interessantes para análises centradas nos conceitos pré-estabelecidos e fornecidos na questão que originou os textos dos alunos. As redes a partir de corte percentual apresentaram-se como representações bastante úteis para investigações interessadas em fazer um recorte ou destacar os aspectos considerados mais relevantes pelos alunos sobre determinado tema. Os mapas conceituais construídos neste trabalho mostraram-se como representações extremamente valiosas para conhecer a aproximada estrutura de conhecimentos dos grupos de alunos, uma vez que explicitam a natureza das relações proposicionais entre os conceitos. A construção de mapas conceituais partindo-se tanto dos textos originais quanto dos textos modificados permitiu concluir que as estruturas gráficas obtidas dos dois modos se aproximam bastante uma da outra, apresentando alta semelhança. Esta semelhança sugere que a utilização do programa Hamlet® para a obtenção de matrizes que quantificam relações entre conceitos presentes em um texto na forma como foi escrito é eficiente quando comparada ao processo manual e mais demorado de se extraírem proposições de um texto para obter uma matriz.
The drafting of essays by students in response to essay questions in the school context seems to be the most conventional way to assess the students` knowledge. The essay produced by a student seeks to approximately reflect his/her knowledge structure about a certain domain. The reading and the analysis of essays are tasks that require a considerable time in a teacher\'s or researcher\'s routine and, when the interest on knowing the most important ideas about a certain topic is verified in a group of students, the task is even harder. The main objective of this research is to develop a methodology that uses computational tools in order to transform written essays in graphic structures such as concept maps and networks graphs. This methodology could be useful not only for teaching research purposes but also for teaching practice, since the final product of its application may lead to inferences about the knowledge structure of a group of students. The investigation developed herein was based on data collected from two distinct matters of undergraduate IQ-USP. That data refer to written essays of 42 students in response to an essay question provided of some pre-established concepts. From the students\' responses, tests were performed: (i) with two softwares for text analysis with a view of quantifying the relationships between concepts, (ii) to investigate the influence on the quantification of relationships between concepts, from the original text and from the propositions extracted from this original text and (iii) to obtain different types of graphic structures. From the tests that were done, it was possible to conclude that Hamlet® consists in a more efficient and convenient program than ALA-Reader® to the objectives of this research. Furthermore, the array generated by Hamlet® program to quantify the relationships between concepts depends essentially on the structure of the essay - either the original text or the modified text. The three types of graphic structures that were built present different focuses, however, these graphic structures may be considered complementaries. The (V+P) network graphs can be thought as interesting representations that focuses on pre-established concepts that were provided on the essay questions. Network graphs from cutting percentage can be thought as representations that are more useful for investigations interested in making a cut or in highlighting the most relevant aspects of a subject by the students. The concept maps constructed in this paper can be thought as extremely valuable representations to know the approximate knowledge structure of the students groups, since they make clear the nature of the propositional relationships between concepts. The construction of concept maps starting from the original texts and also from the modified texts proved that both graphic structures obtained are very close to each other, being highly similar. This similarity suggests that the use of the Hamlet® program to obtain arrays that quantify relationship between concepts found in an original text is more effective in comparison to manual and time-consuming process of extracting propositions from the original text to obtain an array.

35

Dorante, Alessandra. "Investigação de processo de conversão automática de textos estruturados para hiperdocumentos." Universidade de São Paulo, 1997. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-15092010-164303/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Esta dissertação investiga o processo de conversão automática de textos estruturados para hiperdocumentos. Analisa vantagens e desvantagens da utilização de um processo automático. Faz um levantamento detalhado das etapas envolvidas nesta conversão. Como resultado da pesquisa propõe um processo de conversão baseado em definições formais da estrutura dos documentos e das citações. O domínio de aplicação do processo de conversão é o conjunto de normas estatutárias jurídicas brasileiras. Outro resultado deste trabalho é a ferramenta WebifyLaw que implementa o processo de conversão automática para o conjunto das normas estatutárias jurídicas brasileiras. Os resultados da aplicação da WebifyLaw na Constituição Federal, no Código Civil e no Código de Processo Civil e em outras 42 normas são apresentados e discutidos.
This work centered in the research of the automatic conversion of structured texts into hyperdocuments. It presents an analysis concerning the advantages and disadvantages of such automatic process. It also details the steps involved in this conversion. As one of the results it proposes an automatic conversion process, which is based on document structure and citations´ formal definitions. The application domain is set as Brazilian statutory norms. Another contribution from this work is a tool called WebifyLaw, which implements the automatic conversion process for the chosen domain. The tool was applied to the Brazilian Constitution, the Civil Code among other 42 norms. The results obtained in using this application are also presented and discussed.

36

Guerdoud, Mohand. "Acquisition de connaissances a partir de textes structures." Paris 6, 1997. http://www.theses.fr/1997PA066365.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Le but de ce travail est de concevoir une plate-forme permettant d'acquerir des connaissances a partir de textes de description de concepts naturels pour constituer des bases de connaissances. Ce systeme d'acquisition, qui ne peut se passer de toute intervention humaine, est concu neanmoins pour reduire autant que possible le nombre d'interactions de l'utilisateur-expert lors du processus d'acquisition. Ces interactions sont indispensables pour acquerir l'expertise de l'intervenant mais aussi pour resoudre les problemes d'ambiguite ou de choix. Les textes auxquels nous nous interessons ont la particularite de posseder une structure sous-jacente liee a la structure des objets decrits et qui se retrouve d'un concept a l'autre. Cette regularite aide le lecteur dans la comparaison des descriptions et varie plus ou moins selon les auteurs et le groupe de concepts decrits. Notre travail consiste tout d'abord a concevoir un modele de base adapte aux types de connaissances a acquerir. Differents scenarios d'acquisition, de complexite croissante, sont ensuite etudies. Le scenario le plus simple laisse le processus d'acquisition entierement a la charge de l'utilisateur. Le scenario le plus complexe, lui, utilise la regularite des textes pour reduire les actions et indications de l'expert dans le processus d'acquisition. Profitant de la regularite des textes etudies, la solution retenue exploite une nouvelle methode d'alignement multiple de textes sous contrainte d'un modele afin de propager efficacement les indications de l'expert d'un texte a l'autre. Le systeme d'acquisition obtenu est incremental, interactif grace a une interface resolument modeless et il garantit a chaque pas le maintien de la coherence de la base. Le domaine d'application privilegie de ce travail est sans conteste la systematique qui est un domaine ou la sauvegarde de l'expertise apparait cruciale aujourd'hui et dont les ecrits sont accumules depuis plusieurs siecles. Les descriptions de concepts naturels tels qu'on les trouve dans les monographies de la systematique, qui decrivent la biodiversite vegetale et animale, sont des sources importantes d'applications. Cependant des textes de description de concepts presentant des regularites se retrouvent dans d'autres domaines scientifiques comme la medecine, la geologie ou la mineralogie.

37

Davis, Marcia H. "Effects of text markers and familiarity on component structures of text-based representations." College Park, Md. : University of Maryland, 2006. http://hdl.handle.net/1903/4086.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Thesis (Ph. D.) -- University of Maryland, College Park, 2006.
Thesis research directed by: Human Development. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

38

Lemarié, Julie. "La compréhension des textes visuellement structurés : le cas des énumérations." Toulouse 2, 2006. http://www.theses.fr/2006TOU20041.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Nos recherches portent sur l'influence des signaux visuels (titres, énumérations,. . . ) sur la compréhension de textes. Notre thèse est la suivante : comprendre un texte ne consiste pas seulement à comprendre son contenu propositionnel mais également à interpréter ses propriétés visuelles. Cette thèse vise à enrichir les modèles existants de la compréhension en mettant l'accent sur les processus impliqués dans la compréhension et spécifiques au traitement cognitif des textes écrits. Afin d'éprouver cette thèse, nous évaluons la contribution du Modèle d'Architecture Textuelle, modèle logico-linguistique, à l'étude de la compréhension de textes. Ce modèle permet d'analyser la portée sémantique des signaux visuels des textes. Nous éprouvons expérimentalement différentes hypothèses dérivées du modèle. Les résultats indiquent que des textes identiques sur le plan de leur contenu propositionnel mais différant sur le plan de leur signalisation visuelle peuvent donner lieu à différentes interprétations
Our research deals with the influence of visual signals (headings, enumerations,. . . ) on text comprehension. Our general claim is that text comprehension is not restricted to the interpretation of the text propositional content but also consists in interpreting the text visual properties. This assumption aims to enrich existing comprehension models : we shed the light on processes implied in comprehension and that are specific to the cognitive processing of written texts. To test this assumption, we evaluate the contribution of the Textual Architecture Model to the study of text comprehension. This model offers means to analyse the semantic scope of text visual signals. We investigate different assumptions coming from the model. Results indicate that texts with the same propositional content but different visual signaling devices give rise to different interpretations

39

Ågren, Ola. "Finding, extracting and exploiting structure in text and hypertext /." Umeå, 2009. http://opac.nebis.ch/cgi-bin/showAbstract.pl?u20=9789172647992.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Ågren, Ola. "Finding, extracting and exploiting structure in text and hypertext." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-22352.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Data mining is a fast-developing field of study, using computations to either predict or describe large amounts of data. The increase in data produced each year goes hand in hand with this, requiring algorithms that are more and more efficient in order to find interesting information within a given time. In this thesis, we study methods for extracting information from semi-structured data, for finding structure within large sets of discrete data, and to efficiently rank web pages in a topic-sensitive way. The information extraction research focuses on support for keeping both documentation and source code up to date at the same time. Our approach to this problem is to embed parts of the documentation within strategic comments of the source code and then extracting them by using a specific tool. The structures that our structure mining algorithms are able to find among crisp data (such as keywords) is in the form of subsumptions, i.e. one keyword is a more general form of the other. We can use these subsumptions to build larger structures in the form of hierarchies or lattices, since subsumptions are transitive. Our tool has been used mainly as input to data mining systems and for visualisation of data-sets. The main part of the research has been on ranking web pages in a such a way that both the link structure between pages and also the content of each page matters. We have created a number of algorithms and compared them to other algorithms in use today. Our focus in these comparisons have been on convergence rate, algorithm stability and how relevant the answer sets from the algorithms are according to real-world users. The research has focused on the development of efficient algorithms for gathering and handling large data-sets of discrete and textual data. A proposed system of tools is described, all operating on a common database containing "fingerprints" and meta-data about items. This data could be searched by various algorithms to increase its usefulness or to find the real data more efficiently. All of the methods described handle data in a crisp manner, i.e. a word or a hyper-link either is or is not a part of a record or web page. This means that we can model their existence in a very efficient way. The methods and algorithms that we describe all make use of this fact.
Informationsutvinning (som ofta kallas data mining även på svenska) är ett forskningsområde som hela tiden utvecklas. Det handlar om att använda datorer för att hitta mönster i stora mängder data, alternativt förutsäga framtida data utifrån redan tillgänglig data. Eftersom det samtidigt produceras mer och mer data varje år ställer detta högre och högre krav på effektiviteten hos de algoritmer som används för att hitta eller använda informationen inom rimlig tid. Denna avhandling handlar om att extrahera information från semi-strukturerad data, att hitta strukturer i stora diskreta datamängder och att på ett effektivt sätt rangordna webbsidor utifrån ett ämnesbaserat perspektiv. Den informationsextraktion som beskrivs handlar om stöd för att hålla både dokumentationen och källkoden uppdaterad samtidigt. Vår lösning på detta problem är att låta delar av dokumentationen (främst algoritmbeskrivningen) ligga som blockkommentarer i källkoden och extrahera dessa automatiskt med ett verktyg. De strukturer som hittas av våra algoritmer för strukturextraktion är i form av underordnanden, exempelvis att ett visst nyckelord är mer generellt än ett annat. Dessa samband kan utnyttjas för att skapa större strukturer i form av hierarkier eller riktade grafer, eftersom underordnandena är transitiva. Det verktyg som vi har tagit fram har främst använts för att skapa indata till ett informationsutvinningssystem samt för att kunna visualisera indatan. Huvuddelen av den forskning som beskrivs i denna avhandling har dock handlat om att kunna rangordna webbsidor utifrån både deras innehåll och länkarna som finns mellan dem. Vi har skapat ett antal algoritmer och visat hur de beter sig i jämförelse med andra algoritmer som används idag. Dessa jämförelser har huvudsakligen handlat om konvergenshastighet, algoritmernas stabilitet givet osäker data och slutligen hur relevant algoritmernas svarsmängder har ansetts vara utifrån användarnas perspektiv. Forskningen har varit inriktad på effektiva algoritmer för att hämta in och hantera stora datamängder med diskreta eller textbaserade data. I avhandlingen presenterar vi även ett förslag till ett system av verktyg som arbetar tillsammans på en databas bestående av “fingeravtryck” och annan meta-data om de saker som indexerats i databasen. Denna data kan sedan användas av diverse algoritmer för att utöka värdet hos det som finns i databasen eller för att effektivt kunna hitta rätt information.
AlgExt, CHiC, ProT

41

Dunning, Ted Emerson. "Finding structure in text, genome and other symbolic sequences." Thesis, University of Sheffield, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.310811.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Oakley, Angela L. "Typesetting of integrated scientific text and chemical structure diagrams." Thesis, University of Portsmouth, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.237873.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

43

Hirose, Koji. "Effects of text structure instruction on Japanese EFL students." Thesis, University of Leicester, 2014. http://hdl.handle.net/2381/28619.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

An instructional approach to replace the traditional Yakudoku method is required for the instruction of text comprehension. The traditional Yakudoku method focuses on the translation of English into Japanese in a single sentence, which disturbs the flow of text comprehension and results in a loss of meaning. One way to resolve this may direct students’ attention to the whole text through the learning of text structure. While the effect of text structure instruction has been exhibited in the L1 context, little empirical research has examined the effectiveness of the teaching of text structure for the Japanese students. The present study investigated the effects of the teaching of text structure. A mixed methods design was employed with an emphasis on a quantitative approach. Instruction was given to college students over a total of seven lessons. Reading comprehension tests, recall tests, and questionnaires were used as data collection methods, complemented by interviews. The results showed that the intervention could strongly improve the participants’ reading comprehension. Especially, the lower group benefited greatly from the intervention. Recall data collected from all the participants did not indicate a significant increase in the comparison organisation although the extracted participants significantly increased the amount of information. No significant increase was produced in the problem/solution organisation while the lower experimental participants produced a light increase. The intervention modestly altered students’ identification of the two types of the comparison and problem/solution organisation, especially for the lower experimental participants. The results also indicated that at the onset, more than half of the participants lacked the knowledge of text structure. Through the intervention, the number of experimental participants who could identify the rhetorical organisation rose. These results suggest that the teaching of text structure is effective for students with low reading ability to read expository text.

44

Christensen, Jamie Lynn. "Enhancing Students' Science Content Knowledge Through Text Structure Awareness." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2564.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Thomas, Karen. "Deepening Understanding of Science Content Through Text Structure Instruction." Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd3075.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Eisenberg, Joshua Daniel. "Automatic Extraction of Narrative Structure from Long Form Text." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3912.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Automatic understanding of stories is a long-time goal of artificial intelligence and natural language processing research communities. Stories literally explain the human experience. Understanding our stories promotes the understanding of both individuals and groups of people; various cultures, societies, families, organizations, governments, and corporations, to name a few. People use stories to share information. Stories are told –by narrators– in linguistic bundles of words called narratives. My work has given computers awareness of narrative structure. Specifically, where are the boundaries of a narrative in a text. This is the task of determining where a narrative begins and ends, a non-trivial task, because people rarely tell one story at a time. People don’t specifically announce when we are starting or stopping our stories: We interrupt each other. We tell stories within stories. Before my work, computers had no awareness of narrative boundaries, essentially where stories begin and end. My programs can extract narrative boundaries from novels and short stories with an F1 of 0.65. Before this I worked on teaching computers to identify which paragraphs of text have story content, with an F1 of 0.75 (which is state of the art). Additionally, I have taught computers to identify the narrative point of view (POV; how the narrator identifies themselves) and diegesis (how involved in the story’s action is the narrator) with F1 of over 0.90 for both narrative characteristics. For the narrative POV, diegesis, and narrative level extractors I ran annotation studies, with high agreement, that allowed me to teach computational models to identify structural elements of narrative through supervised machine learning. My work has given computers the ability to find where stories begin and end in raw text. This allows for further, automatic analysis, like extraction of plot, intent, event causality, and event coreference. These tasks are impossible when the computer can’t distinguish between which stories are told in what spans of text. There are two key contributions in my work: 1) my identification of features that accurately extract elements of narrative structure and 2) the gold-standard data and reports generated from running annotation studies on identifying narrative structure.

47

Van, Blommestein Erane. "Production factors for written expository texts." Thesis, University of British Columbia, 1991. http://hdl.handle.net/2429/30415.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Expository text writing is a task that demands high-level cognitive and linguistic skill in order to produce well-written texts. Individuals who have cognitive-communicative impairments following mild closed head injury often display difficulty in organization, recall and attention when writing texts. The purpose of this study was to investigate factors that facilitate production of coherent expository texts by two unimpaired adults, with the ultimate goal of applying the results to work with head-injured individuals. These factors were: type of texts and type of support found in the text elicitation context. It was hypothesized that Description texts would be easiest to produce, followed by Comparison, Sequence, and Response texts. It was also hypothesized that texts that were supported in the elicitation context by explicit information regarding text structure would result in more coherent texts than those written without such support. Furthermore, texts that were supported by structure plus content information were hypothesized to result in texts that were most coherent. Finally, it was questioned whether texts that were produced in the absence of support, but after the two support conditions had been completed, would exhibit a learning effect. Therefore, the effect of four elicitation contexts and four text types were examined. Each subject wrote sixteen texts. Text adequacy was measured using cohesive harmony analysis (Hasan, 1984, 1985) and a reader rating scale that was intended to measure perceived coherence. Results from Subject One were consistent with the hypothesized order of text difficulty. As well, the conditions in which text structure was provided generally resulted in more coherent texts than the texts produced without support. Evidence for a learning effect in the last condition was not found. Because the addition of content did not appear to increase text coherence when compared to texts produced with structural support alone, particularly for easier text types, it was suggested that a ceiling effect may have occurred for this subject, so that additional reduction of processing demands did not result in improved text production. The results from Subject Two were inconclusive, particularly for the effect of elicitation context. Order of text type difficulty differed from the expected order for this subject's texts. This demonstrates the variability that occurs among unimpaired writers in both text coherence and how writing tasks are approached, as well as the need for further studies using larger samples. Text ratings by a group of Speech-Langauge Pathologists did not match the results of the cohesive harmony analysis for text type. It was suggested that this disparity may be due to: inadequacies in cohesive harmony analysis that make it insensitive to features of texts readers use in order to determine coherence; or differences among texts in the readers' ability to construct text structure as they read. Texts produced in contexts with support generally received higher perceived coherence ratings than those written without such support. Inter-rater variability was marked, especially for texts low in cohesive harmony. Modifications to the procedures used in this study for both further research and clinical application are discussed.
Medicine, Faculty of
Audiology and Speech Sciences, School of
Graduate

48

Kou, Huaizhong. "Génération d'adaptateurs web intelligents à l'aide de techniques de fouilles de texte." Versailles-St Quentin en Yvelines, 2003. http://www.theses.fr/2003VERS0011.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Cette thèse définit un système d'informations Web d'intégration sémantique, appelé SEWISE qui peut intégrer des informations textuelles provenant de différentes sources Web. Dans SEWISE les adaptateurs Web sont construits autour de différents sites Web pour extraire automatiquement des informations intéressantes. Des technologies de fouille de texte sont alors employées pour découvrir des sémantiques abordées dans les documents. SEWISE peut assister à la recherche des informations sur le Web. Trois problèmes liés à la catégorisation de document sont étudiés. Premièrement, nous étudions les approches de sélection de termes et nous proposons deux approches CBA et IBA pour choisir ces termes. Puis, pour estimer des associations statistiques entre termes, un modèle mathématique est proposé. Finalement, les algorithmes de calculs de scores de catégories employées par des classificateurs k-NN sont étudiés. Deux algorithmes pondérés CBW et IBW pour calculer des scores de catégories sont proposés
This thesis defines a system framework of semantically integrating Web information, called SEWISE. It can integrate text information from various Web sources belonging to an application domain into common domain-specific concept ontology. In SEWISE, Web wrappers are built around different Web sites to automatically extract interesting information from. Text mining technologies are then used to discover the semantics Web documents talk about. SEWISE can ease topic-oriented information researches over the Web. Three problems related to the document categorization are studied. Firstly, we investigate the approaches to feature selection and proposed two approaches CBA and IBA to select features. To estimate statistic term associations and integrate them within document similarity model, a mathematical model is proposed. Finally, the category score calculation algorithms used by k-NN classifiers are studied. Two weighted algorithms CBW and IBW to calculate category score are proposed

49

Forsyth, Richard. "Stylistic structures : a computational approach to text classification." Thesis, University of Nottingham, 1996. http://eprints.nottingham.ac.uk/13445/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The problem of authorship attribution has received attention both in the academic world (e.g. did Shakespeare or Marlowe write Edward III?) and outside (e.g. is this confession really the words of the accused or was it made up by someone else?). Previous studies by statisticians and literary scholars have sought "verbal habits" that characterize particular authors consistently. By and large, this has meant looking for distinctive rates of usage of specific marker words -- as in the classic study by Mosteller and Wallace of the Federalist Papers. The present study is based on the premiss that authorship attribution is just one type of text classification and that advances in this area can be made by applying and adapting techniques from the field of machine learning. Five different trainable text-classification systems are described, which differ from current stylometric practice in a number of ways, in particular by using a wider variety of marker patterns than customary and by seeking such markers automatically, without being told what to look for. A comparison of the strengths and weaknesses of these systems, when tested on a representative range of text-classification problems, confirms the importance of paying more attention than usual to alternative methods of representing distinctive differences between types of text. The thesis concludes with suggestions on how to make further progress towards the goal of a fully automatic, trainable text-classification system.

50

JU, QI. "Large-scale Structural Reranking for Hierarchical Text Categorization." Doctoral thesis, Università degli studi di Trento, 2013. https://hdl.handle.net/11572/369177.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Current hierarchical text categorization (HTC) methods mainly fall into three directions: (1) Flat one-vs.-all approach, which flattens the hierarchy into independent nodes and trains a binary one-vs.-all classifier for each node. (2) Top-down method, which uses the hierarchical structure to decompose the entire problem into a set of smaller sub-problems, and deals with such sub-problems in top-down fashion along the hierarchy. (3) Big-bang approach, which learns a single (but generally complex) global model for the class hierarchy as a whole with a single run of the learning algorithm. These methods were shown to provide relatively high performance in previous evaluations. However, they still suffer from two main drawbacks: (1) relatively low accuracy as they disregard category dependencies, or (2) low computational efficiency when considering such dependencies. In order to build an accurate and efficient model we adopted the following strategy: first, we design advanced global reranking models (GR) that exploit structural dependencies in hierarchical multi-label text classification (TC). They are based on two algorithms: (1) to generate the k-best classification of hypotheses based on decision probabilities of the flat one-vs.-all and top-down methods; and (2) to encode dependencies in the reranker by: (i) modeling hypotheses as trees derived by the hierarchy itself and (ii) applying tree kernels (TK) to them. Such TK-based reranker selects the best hierarchical test hypothesis, which is naturally represented as a labeled tree. Additionally, to better investigate the role of category relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. Second, we propose an efficient local incremental reranking model (LIR), which combines a top-down method with a local reranking model for each sub-problem. These local rerankers improve the accuracy by absorbing the local category dependencies of sub-problems, which alleviate the errors of top-down method in the higher levels of the hierarchy. The application of LIR recursively deals with the sub-problems by applying the corresponding local rerankers in top-down fashion, resulting in high efficiency. In addition, we further optimize LIR by (i) improving the top-down method by creating local dictionaries for each sub-problem; (ii) using LIBLINEAR instead of LIBSVM; and (iii) adopting the compact representation of hypotheses for learning the local reranking model. This makes LIR applicable for large-scale hierarchical text categorization. The experimentation on different hierarchical datasets has shown promising enhancements by exploiting the structural dependencies in large-scale hierarchical text categorization.

Дисертації з теми "Texte structuré"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями