Bibliografias temáticas / Classification texte

Literatura científica selecionada sobre o tema "Classification texte"

Autor: Grafiati

Publicado: 25 de maio de 2024

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Índice

Artigos de revistas
Teses / dissertações
Livros
Capítulos de livros
Trabalhos de conferências
Relatórios de organizações

Consulte a lista de atuais artigos, livros, teses, anais de congressos e outras fontes científicas relevantes para o tema "Classification texte".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Artigos de revistas sobre o assunto "Classification texte"

Garrido, Carlos. "Deficiencias del texto de partida en la traducción de textos destinados a la enseñanza y divulgación de la ciencia". Meta 60, n.º 3 (5 de abril de 2016): 454–75. http://dx.doi.org/10.7202/1036138ar.

Texto completo da fonte

Resumo:

Le présent article définit et délimite de façon précise le concept de déficience ou défaut dans le texte de départ lors de la traduction communicative de textes scientifiques didactiques et de vulgarisation, tout en le différenciant d’autres concepts voisins (comme les passages qui suscitent chez le traducteur l’introduction d’améliorations, des passages inexacts dus à une légitime simplification pédagogique, etc.). Par la suite, l’article propose une classification spécifique de ces déficiences, à partir d’un vaste échantillon de traductions vers l’espagnol et le portugais, publiées et inédites (réalisées par l’auteur), de fragments d’articles d’encyclopédie, de manuels et de livres et articles de vulgarisation rédigés en anglais et en allemand. La classification proposée des déficiences qui apparaissent fréquemment dans les textes d’origine destinés à l’enseignement et la vulgarisation de la science et qui doivent être corrigées dans le corps du texte cible de la traduction, établit une distinction, au niveau hiérarchique supérieur, entre déficiences factuelles d’une part, en contradiction avec la désignation de la vérité connue ou avec une correspondance adéquate entre les composants verbal et iconique du texte (six sous-catégories) et déficiences formelles d’autre part, qui lèsent la rigueur expressive ou l’efficacité communicative (huit sous-catégories). Finalement, l’article analyse les diverses exigences cognitives liées à la détection et la correction de telles déficiences dans le cadre de la traduction communicative.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Hoek, Leo H. "Timbres-poste et intermédialité". Protée 30, n.º 2 (9 de julho de 2003): 33–44. http://dx.doi.org/10.7202/006729ar.

Texto completo da fonte

Resumo:

Résumé La classification et l’interprétation de textes intermédiaux, c’est-à-dire des textes combinant le texte avec l’image, dépend du point de vue adopté vis-à-vis de la situation de communication, soit la production, soit la réception de tels textes. La production d’un texte intermédial est, dans certains cas, simultanée (affiches, bande dessinée, pubs) et consécutive dans d’autres cas (critique d’art, ekphrasis, illustrations). La réception d’un texte intermédial est, dans la plupart des cas, simultanée (illustrations, affiches, pubs, bande dessinée) et consécutive dans certains cas particuliers (critique d’art, ekphrasis). À la base de ces deux critères, simultanéité ou consécutivité, on peut distinguer différents degrés d’enchevêtrement du texte et de l’image dans le discours intermédial. Un troisième critère – la distinctivité, c’est-à-dire la possibilité de séparer physiquement le texte de l’image – permet maintenant de distinguer entre quatre degrés d’intrication croissante : transposition, juxtaposition, combinaison et fusion du texte et de l’image. Une telle catégorisation mène à une construction théorique de types virtuels d’enchevêtrement et d’intermédialité. Il s’avère qu’un nombre limité de catégories nous permet de définir toute occurrence de discours intermédial comme une combinaison spécifique de types intermédiaux et que toute occurrence de discours intermédial peut être définie selon les termes de ces catégories. Cela ne veut pas dire que les occurrences se conforment ou se limitent nécessairement aux catégories distinguées. Celles-ci ne constituent en effet pas un nombre limité de types d’intermédialité, car bien d’autres combinaisons sont possibles. Le timbre-poste commémoratif – discours presque toujours intermédial – constitue un exemple parfait à la fois du pouvoir descriptif de la catégorisation proposée et de la créativité artistique spécifique déployée dans chaque timbre. Une analyse des relations entre texte et image dans une série de timbres néerlandais montre l’efficacité de la catégorisation élaborée et la possibilité de combiner différentes catégories de discours intermédial dans un seul timbre-poste.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Hurley, Robert. "Le genre « évangile » en fonction des effets produits par la mise en intrigue de Jésus". Laval théologique et philosophique 58, n.º 2 (27 de novembro de 2002): 243–57. http://dx.doi.org/10.7202/000359ar.

Texto completo da fonte

Resumo:

Résumé Quel est le genre littéraire des évangiles ? Sont-ils des biographies ? Des légendes héroïques ? Faut-il les classer tous dans le même genre ? En cherchant une réponse à ces questions, la critique biblique a souvent eu recours à une approche de littérature comparée, cherchant des ressemblances entre les évangiles et d’autres ouvrages antiques. On essaie de repérer des ressemblances formelles entre deux contenus figés, entre deux corpus « apparemment » finis. Du moment où l’on abandonne l’idée du texte comme un objet statique en faveur d’une conception dynamique du texte comme événement vécu par le lecteur, cette approche se heurte à des obstacles majeurs. L’auteur prétend que la classification générique des évangiles exigerait un procédé plus subtil que la simple comparaison formelle ou thématique des textes et il essaie de montrer que le genre d’un texte ne peut se spécifier sans référence aux effets produits chez le lecteur.

Estilos ABNT, Harvard, Vancouver, APA, etc.

GARON-AUDY, Muriel. "La logique de l’acte de classification : postulat ou question pour l’analyse de la mobilité". Sociologie et sociétés 8, n.º 2 (30 de setembro de 2002): 37–60. http://dx.doi.org/10.7202/001283ar.

Texto completo da fonte

Resumo:

Résumé Ce texte reprend une série de postulats sur lesquels reposent les analyses de mobilité, en particulier l'unanimité sociale fondamentale qui gouvernerait les orientations au sein d'une formation sociale. L'auteur tente de montrer que les critères de désirabilité sont encore mal établis et qu'en tout cas, l'idée d'unanimité à leur sujet est mal fondée, d'où le danger de définir une rationalité dans le processus de la réussite sociale qui implique une unanimité au sujet de l'échelle du désirable. Dans le but d'éliminer le danger du postulat de l'unanimité sociale qui raie toute analyse du sens, dans le but également d'indiquer les voies que ce type d'analyse peut prendre, l'auteur présente dans un deuxième temps les conclusions d'une étude permettant de définir, à partir de textes rédigés en classe par des étudiants de diverses origines sociales, l'ensemble des catégories qu'ils utilisent pour classifier les gens de leur entourage; cette analyse démontre la charge idéologique qui investit la rationalité du discours " petit-bourgeois " à ce niveau élémentaire de la définition dès catégories de base, et la résistance que le discours ouvrier lui oppose.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Cazé, Antoine. "Poésie, texte métissé, une lecture de David Antin". Recherches anglaises et nord-américaines 21, n.º 1 (1988): 21–28. http://dx.doi.org/10.3406/ranam.1988.1187.

Texto completo da fonte

Resumo:

David Antin's texts are a challenge to genre classification. Antin is one of those performing artists who, in their improvisations, try to come as closely as possible to the moment when their personal voice alters the language they speak and the world this language creates. The transcripts of the improvisations - public talks, radio programs, etc - are reworked by the poet into printed material. The roken threads of Antin's voice weave a portrait of the artist which is never completed but only looms in the background, somewhere between poetry and prose, defying closure.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Bonafin, Massimo. "Fra filologia e antropologiala genesi del lupo e della volpe". Reinardus / Yearbook of the International Reynard Society 11 (15 de novembro de 1998): 25–35. http://dx.doi.org/10.1075/rein.11.03bon.

Texto completo da fonte

Resumo:

Résumé La branche 24 du Roman de Renart n'a pas eu beaucoup de chance parmi les critiques, et pourtant elle présente bien de motifs d'interêt philologique et herméneutique. Ici on essaie d'analyser les divers aspects du texte de la branche, qui est transmise uniquement par les manuscrits BCMn. De considère en l'espèce: 1. les structures du récit (prologue, genèse, enfances), 2. le problème des sources (Aucupre, tradition biblique, autre branches), 3. les substrats anthropologiques (création et classification des animaux, parenté entre Renart et Isengrin), 4. possibilité d'erreurs dans la tradition du texte, 5. la valeur des anthroponymes Renart et Isengrin. Cette réexamination critique est la démarche nécessaire pour la restitution intégrale du sens d'une texte aussi particulier comme la branche 24.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Zakharia, Katia. "Les Mille et une nuits. Histoire du texte et classification des contes". Arabica 56, n.º 1 (2009): 132–34. http://dx.doi.org/10.1163/157005809x398708.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Piscini, Gianluca. "Sources, cibles et structure de deux réflexions de Porphyre sur l’athéisme (commentaire sur le Timée, fragment 28 Sodano ; Lettre à Marcella 21-23)". Revue des Études Grecques 134, n.º 1 (2021): 143–75. http://dx.doi.org/10.3406/reg.2021.8674.

Texto completo da fonte

Resumo:

Le fragment 28 Sodano du Commentaire sur le Timée et les chapitres 21-23 de la Lettre à Marcella de Porphyre contiennent tous deux une classification des causes de l’athéisme. La présente étude se propose d’étudier ces deux réflexions porphyriennes qui, jusqu’à présent, n’ont été ni examinées en détail, ni comparées. Pour ce faire, on commencera par une brève analyse de deux autres textes similaires. Porphyre a repris en effet une liste des causes d’impiété dressée par Platon dans les Lois, qui avait déjà été exploitée par Origène dans son Traité de la prière. On remarque d’une part, une structuration ordonnée des causes d’athéisme, qui forment une gradation, et d’autre part, une actualisation du texte platonicien de la part d’Origène : autant de traits qui caractérisent aussi la réflexion de Porphyre. Mais on verra que ce dernier fait preuve d’une remarquable originalité, en complétant la taxinomie proposée par ses prédécesseurs et surtout, dans la Lettre à Marcella, en structurant le texte de manière à renforcer l’opposition entre les différentes opinions philosophiques qu’il décrit. L’étude des affinités entre les deux textes porphyriens et de la construction de Lettre 21-23 permettra de jeter une lumière nouvelle sur la question des cibles du philosophe, identifiées par certains avec les chrétiens.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Gagnon, Mathieu. "Penser la question des rapports aux savoirs en éducation : clarification et besoin de recherches conceptuelles". Les ateliers de l'éthique 6, n.º 1 (28 de março de 2018): 30–42. http://dx.doi.org/10.7202/1044300ar.

Texto completo da fonte

Resumo:

Ce texte examine la question des rapports aux savoirs par la mise en évidence d’enjeux conceptuels, auxquels se rapportent des enjeux éducatifs et éthiques. À cet égard, l’auteur propose un essai de classification et d’organisation par le recours, notamment, à quatre types de rapports aux savoirs.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Caparros, Ernest. "La nature juridique commune du patrimoine familial et de la société d’acquêts". Revue générale de droit 30, n.º 1 (1 de dezembro de 2014): 1–60. http://dx.doi.org/10.7202/1027599ar.

Texto completo da fonte

Resumo:

Le texte présente un effort de clarification des concepts, de qualification et de classification du patrimoine familial et de la société d’acquêts dans le nouveau Code civil du Québec. Il établit la nature commune de ces deux régimes comme régimes secondaires légaux, le premier impératif le second, supplétif.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Mais fontes

Teses / dissertações sobre o assunto "Classification texte"

Tisserant, Guillaume. "Généralisation de données textuelles adaptée à la classification automatique". Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS231/document.

Texto completo da fonte

Resumo:

La classification de documents textuels est une tâche relativement ancienne. Très tôt, de nombreux documents de différentes natures ont été regroupés dans le but de centraliser la connaissance. Des systèmes de classement et d'indexation ont alors été créés. Ils permettent de trouver facilement des documents en fonction des besoins des lecteurs. Avec la multiplication du nombre de documents et l'apparition de l'informatique puis d'internet, la mise en œuvre de systèmes de classement des textes devient un enjeu crucial. Or, les données textuelles, de nature complexe et riche, sont difficiles à traiter de manière automatique. Dans un tel contexte, cette thèse propose une méthodologie originale pour organiser l'information textuelle de façon à faciliter son accès. Nos approches de classification automatique de textes mais aussi d'extraction d'informations sémantiques permettent de retrouver rapidement et avec pertinence une information recherchée.De manière plus précise, ce manuscrit présente de nouvelles formes de représentation des textes facilitant leur traitement pour des tâches de classification automatique. Une méthode de généralisation partielle des données textuelles (approche GenDesc) s'appuyant sur des critères statistiques et morpho-syntaxiques est proposée. Par ailleurs, cette thèse s'intéresse à la construction de syntagmes et à l'utilisation d'informations sémantiques pour améliorer la représentation des documents. Nous démontrerons à travers de nombreuses expérimentations la pertinence et la généricité de nos propositions qui permettent une amélioration des résultats de classification. Enfin, dans le contexte des réseaux sociaux en fort développement, une méthode de génération automatique de HashTags porteurs de sémantique est proposée. Notre approche s'appuie sur des mesures statistiques, des ressources sémantiques et l'utilisation d'informations syntaxiques. Les HashTags proposés peuvent alors être exploités pour des tâches de recherche d'information à partir de gros volumes de données
We have work for a long time on the classification of text. Early on, many documents of different types were grouped in order to centralize knowledge. Classification and indexing systems were then created. They make it easy to find documents based on readers' needs. With the increasing number of documents and the appearance of computers and the internet, the implementation of text classification systems becomes a critical issue. However, textual data, complex and rich nature, are difficult to treat automatically. In this context, this thesis proposes an original methodology to organize and facilitate the access to textual information. Our automatic classification approache and our semantic information extraction enable us to find quickly a relevant information.Specifically, this manuscript presents new forms of text representation facilitating their processing for automatic classification. A partial generalization of textual data (GenDesc approach) based on statistical and morphosyntactic criteria is proposed. Moreover, this thesis focuses on the phrases construction and on the use of semantic information to improve the representation of documents. We will demonstrate through numerous experiments the relevance and genericity of our proposals improved they improve classification results.Finally, as social networks are in strong development, a method of automatic generation of semantic Hashtags is proposed. Our approach is based on statistical measures, semantic resources and the use of syntactic information. The generated Hashtags can then be exploited for information retrieval tasks from large volumes of data

Estilos ABNT, Harvard, Vancouver, APA, etc.

Danuser, Hermann. "Der Text und die Texte. Über Singularisierung und Pluralisierung einer Kategorie". Bärenreiter Verlag, 1998. https://slub.qucosa.de/id/qucosa%3A36795.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Vasil'eva, Natalija. "Eigennamen in der Welt zeitgenössischer Texte". Gesellschaft für Namenkunde e.V, 2007. https://ul.qucosa.de/id/qucosa%3A31517.

Texto completo da fonte

Resumo:

The author presents her monography 'Proper names in the world of text' (original language Russian) in which an integrative approach is proposed, based on principles of text linguistics, narrative theory and literary onomastics. The immediate environments of proper names (microtextology) on the one hand, and the whole text as a space for realization and functioning of proper names (macrotextology) on the other hand are investigated on the material of the modern Russian fiction. Some new concepts and terms have been introduced and interpreted: onymic information, onymic anticipation and retardation (as main text strategies of name introducing), deconstructive function of proper name in text (in addition to proper name functions defined by D. LAMPING). The concept of namelessness in fiction and different metamorphoses of names in slang are also discussed.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Lasch, Alexander. "Texte im Handlungsbereich der Religion". De Gruyter, 2011. https://tud.qucosa.de/id/qucosa%3A74840.

Texto completo da fonte

Resumo:

Eine Typologie für Texte, die der Domäne „Religion“ zugeordnet werden sollen, steht vor verschiedenen Problemen, die sich für Texte aus ‚profanen‘ oder ‚säkularen‘ Diskursbereichen nicht stellen. Zum einen ist die Abgrenzung zum ‚Profanen‘ oder ‚Säkularen‘ und damit die Klärung der Frage, weshalb sprachliche Einheiten als ‚religiös‘ klassifiziert werden, strittig. Zum anderen ist die kommunikative Prägung der Domäne „Religion“ alles andere als knapp zu charakterisieren – dies hängt im Wesentlichen mit der Frage danach zusammen, wer einen Text und dessen Kommunikation (oder/und dessen Vollzug) zu welcher Zeit und an welchem Ort verantwortet. Die letzte Frage betrifft die kommunikativen Besonderheiten der Situationen, in denen die Texte, die der Domäne „Religion“ zugeschrieben werden, kommuniziert werden. Da eine Kommunikationstypologie für die Domäne „Religion“ bisher fehlt, versucht dieser Artikel kommunikative Grundkonstellationen zu skizzieren, die für die linguistische Beschreibung unerlässlich sind. Es wird hier also nicht darum gehen, Textsortentraditionen verschiedener Glaubensgemeinschaften nach der Art und Weise ihrer Überlieferung oder gar ihrem Status innerhalb der Glaubensgemeinschaften zu beleuchten, sondern es wird danach gefragt, was die grundlegenden kommunikativen Konstellationen der Domäne „Religion“ sind und unter welchen Bedingungen Texte in der Domäne „Religion“ kommuniziert werden können. [Aus der Einleitung]

Estilos ABNT, Harvard, Vancouver, APA, etc.

Sayadi, Karim. "Classification du texte numérique et numérisé. Approche fondée sur les algorithmes d'apprentissage automatique". Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066079/document.

Texto completo da fonte

Resumo:

Différentes disciplines des sciences humaines telles la philologie ou la paléographie font face à des tâches complexes et fastidieuses pour l'examen des sources de données. La proposition d'approches computationnelles en humanités permet d'adresser les problématiques rencontrées telles que la lecture, l'analyse et l'archivage de façon systématique. Les modèles conceptuels élaborés reposent sur des algorithmes et ces derniers donnent lieu à des implémentations informatiques qui automatisent ces tâches fastidieuses. La première partie de la thèse vise, d'une part, à établir la structuration thématique d'un corpus, en construisant des espaces sémantiques de grande dimension. D'autre part, elle vise au suivi dynamique des thématiques qui constitue un réel défi scientifique, notamment en raison du passage à l'échelle. La seconde partie de la thèse traite de manière holistique la page d'un document numérisé sans aucune intervention préalable. Le but est d'apprendre automatiquement des représentations du trait de l'écriture ou du tracé d'un certain script par rapport au tracé d'un autre script. Il faut dans ce cadre tenir compte de l'environnement où se trouve le tracé : image, artefact, bruits dus à la détérioration de la qualité du papier, etc. Notre approche propose un empilement de réseaux de neurones auto-encodeurs afin de fournir une représentation alternative des données reçues en entrée
Different disciplines in the humanities, such as philology or palaeography, face complex and time-consuming tasks whenever it comes to examining the data sources. The introduction of computational approaches in humanities makes it possible to address issues such as semantic analysis and systematic archiving. The conceptual models developed are based on algorithms that are later hard coded in order to automate these tedious tasks. In the first part of the thesis we propose a novel method to build a semantic space based on topics modeling. In the second part and in order to classify historical documents according to their script. We propose a novel representation learning method based on stacking convolutional auto-encoder. The goal is to automatically learn plot representations of the script or the written language

Estilos ABNT, Harvard, Vancouver, APA, etc.

Schneider, Ulrich Johannes. "Über Tempel und Texte: ein Bildervergleich". Fink, 1999. https://ul.qucosa.de/id/qucosa%3A12768.

Texto completo da fonte

Resumo:

Die Epochenschwelle vom 18. zum 19. Jahrhundert besteht in einem Schritt vom historischen Rekonstruieren zum hermeneutischen Interpretieren, das jedenfalls zeigt die Geschichte der Hermeneutik und die Geschichte der Geschichtsschreibung. Historische Bilder - der Philosophie, der Mythologie, allgemein - sind damals entworfen und revidiert worden, die sich noch heute im Umgang mit der Philosophie beobachten lassen. Jener Streit um die Bedeutung von Texten für die Philosophie scheint in dieser Epochenschwelle entschieden: Das Immanenzverhältnis ersetzt das Transzendenzverhältnis. Texte sind Orte der Philosophie, nicht Mittel. Aber wie gestaltet sich diese Ersetzung? Ist sie Folge, Folgerung, bildet sie eine selbst immanente Logik, so etwas wie die Logik des historischen Bildes der Philosophie? Im folgenden wird ein Bildervergleich klären helfen, was philosophische Texte sind, auch wenn die angeführten Bilder Tempel zeigen. Beide Bilder lassen sich der für unser heutiges philosophisches Selbstverständnis entscheidenden Epochenschwelle zurechnen.

Estilos ABNT, Harvard, Vancouver, APA, etc.

AKAMA, HIROYUKI. "Tableau, corps, texte : etudes historiques sur la classification-recit en france au xixe siecle". Paris 1, 1992. http://www.theses.fr/1992PA010604.

Texto completo da fonte

Resumo:

Par l'intermediaire du "recit" qui s'appelle ideologie, l'histoire de la classification peut se diviser au xixe siecle en trois stades distinctifs qui connaissaient, chacun en propre, les incarnations figuratives -- ou non -- du "tableau-corps-texte" : 1 la rupture epistemologique du "tableau" qui relevait de l'element de l'ideologie, pour refouler les possibilites de cette derniere dans les phenomenes de "dia-textualite interieure" : 2 l'apparition de l'espace encyclopedique des savoirs qui prenait forme avec le "cone" de classification, puis avec les transformations de celui-ci, destinees a incarner les significations "positivistes" de la sociologie et de l'anthropologie ; 3 la nouvelle rupture du "tableau" par suite de laquelle ces espaces devenaient "fluides" apparaissant comme symboles de la crise anti-positiviste de la science, pour aboutir enfin a la tripartition de discours : l'hysterie, la science-fiction et l'insitut-universite
Through the "story" named ideology (of cabanis and tracy), the history of nineteenth-century "classification" can be divided into three distinctive stages having their own means to embody the complex of "table-body-text" (tableau-corps-texte). Primarily, an epistemological rupture of the "table" (tableau) which was a matter for the element of ideology, and in consequence, the phenomena of "inner dia-textuality" suppressing the possibilities of this current of thought. Secondly, the appearance of some cone-shaped encyclopedic spaces of knowledges, and their transformations to materialize the "positivist" significations of sociology and anthropology. Thirdly, another rupture of the "table" (tableau), as a result of which the spaces of knowledges became fluid to be symbols of anti-positivist crisis of science, and finally a tripartite unity of "discources" (discours) emerged : hysteria, science-fiction and institute-university

Estilos ABNT, Harvard, Vancouver, APA, etc.

Felhi, Mehdi. "Document image segmentation : content categorization". Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0109/document.

Texto completo da fonte

Resumo:

Dans cette thèse, nous abordons le problème de la segmentation des images de documents en proposant de nouvelles approches pour la détection et la classification de leurs contenus. Dans un premier lieu, nous étudions le problème de l'estimation d'inclinaison des documents numérisées. Le but de ce travail étant de développer une approche automatique en mesure d'estimer l'angle d'inclinaison du texte dans les images de document. Notre méthode est basée sur la méthode Maximum Gradient Difference (MGD), la R-signature et la transformée de Ridgelets. Nous proposons ensuite une approche hybride pour la segmentation des documents. Nous décrivons notre descripteur de trait qui permet de détecter les composantes de texte en se basant sur la squeletisation. La méthode est appliquée pour la segmentation des images de documents numérisés (journaux et magazines) qui contiennent du texte, des lignes et des régions de photos. Le dernier volet de la thèse est consacré à la détection du texte dans les photos et posters. Pour cela, nous proposons un ensemble de descripteurs de texte basés sur les caractéristiques du trait. Notre approche commence par l'extraction et la sélection des candidats de caractères de texte. Deux méthodes ont été établies pour regrouper les caractères d'une même ligne de texte (mot ou phrase) ; l'une consiste à parcourir en profondeur un graphe, l'autre consiste à établir un critère de stabilité d'une région de texte. Enfin, les résultats sont affinés en classant les candidats de texte en régions « texte » et « non-texte » en utilisant une version à noyau du classifieur Support Vector Machine (K-SVM)
In this thesis I discuss the document image segmentation problem and I describe our new approaches for detecting and classifying document contents. First, I discuss our skew angle estimation approach. The aim of this approach is to develop an automatic approach able to estimate, with precision, the skew angle of text in document images. Our method is based on Maximum Gradient Difference (MGD) and R-signature. Then, I describe our second method based on Ridgelet transform.Our second contribution consists in a new hybrid page segmentation approach. I first describe our stroke-based descriptor that allows detecting text and line candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. Finally, text candidates are clustered using mean-shift analysis technique according to their corresponding sizes. The method is applied for segmenting scanned document images (newspapers and magazines) that contain text, lines and photo regions. Finally, I describe our stroke-based text extraction method. Our approach begins by extracting connected components and selecting text character candidates over the CIE LCH color space using the Histogram of Oriented Gradients (HOG) correlation coefficients in order to detect low contrasted regions. The text region candidates are clustered using two different approaches ; a depth first search approach over a graph, and a stable text line criterion. Finally, the resulted regions are refined by classifying the text line candidates into « text» and « non-text » regions using a Kernel Support Vector Machine K-SVM classifier

Estilos ABNT, Harvard, Vancouver, APA, etc.

Mazyad, Ahmad. "Contribution to automatic text classification : metrics and evolutionary algorithms". Thesis, Littoral, 2018. http://www.theses.fr/2018DUNK0487/document.

Texto completo da fonte

Resumo:

Cette thèse porte sur le traitement du langage naturel et l'exploration de texte, à l'intersection de l'apprentissage automatique et de la statistique. Nous nous intéressons plus particulièrement aux schémas de pondération des termes (SPT) dans le contexte de l'apprentissage supervisé et en particulier à la classification de texte. Dans la classification de texte, la tâche de classification multi-étiquettes a suscité beaucoup d'intérêt ces dernières années. La classification multi-étiquettes à partir de données textuelles peut être trouvée dans de nombreuses applications modernes telles que la classification de nouvelles où la tâche est de trouver les catégories auxquelles appartient un article de presse en fonction de son contenu textuel (par exemple, politique, Moyen-Orient, pétrole), la classification du genre musical (par exemple, jazz, pop, oldies, pop traditionnelle) en se basant sur les commentaires des clients, la classification des films (par exemple, action, crime, drame), la classification des produits (par exemple, électronique, ordinateur, accessoires). La plupart des algorithmes d'apprentissage ne conviennent qu'aux problèmes de classification binaire. Par conséquent, les tâches de classification multi-étiquettes sont généralement transformées en plusieurs tâches binaires à label unique. Cependant, cette transformation introduit plusieurs problèmes. Premièrement, les distributions des termes ne sont considérés qu'en matière de la catégorie positive et de la catégorie négative (c'est-à-dire que les informations sur les corrélations entre les termes et les catégories sont perdues). Deuxièmement, il n'envisage aucune dépendance vis-à-vis des étiquettes (c'est-à-dire que les informations sur les corrélations existantes entre les classes sont perdues). Enfin, puisque toutes les catégories sauf une sont regroupées dans une seule catégories (la catégorie négative), les tâches nouvellement créées sont déséquilibrées. Ces informations sont couramment utilisées par les SPT supervisés pour améliorer l'efficacité du système de classification. Ainsi, après avoir présenté le processus de classification de texte multi-étiquettes, et plus particulièrement le SPT, nous effectuons une comparaison empirique de ces méthodes appliquées à la tâche de classification de texte multi-étiquette. Nous constatons que la supériorité des méthodes supervisées sur les méthodes non supervisées n'est toujours pas claire. Nous montrons ensuite que ces méthodes ne sont pas totalement adaptées au problème de la classification multi-étiquettes et qu'elles ignorent beaucoup d'informations statistiques qui pourraient être utilisées pour améliorer les résultats de la classification. Nous proposons donc un nouvel SPT basé sur le gain d'information. Cette nouvelle méthode prend en compte la distribution des termes, non seulement en ce qui concerne la catégorie positive et la catégorie négative, mais également en rapport avec toutes les autres catégories. Enfin, dans le but de trouver des SPT spécialisés qui résolvent également le problème des tâches déséquilibrées, nous avons étudié les avantages de l'utilisation de la programmation génétique pour générer des SPT pour la tâche de classification de texte. Contrairement aux études précédentes, nous générons des formules en combinant des informations statistiques à un niveau microscopique (par exemple, le nombre de documents contenant un terme spécifique) au lieu d'utiliser des SPT complets. De plus, nous utilisons des informations catégoriques telles que (par exemple, le nombre de catégories dans lesquelles un terme apparaît). Des expériences sont effectuées pour mesurer l'impact de ces méthodes sur les performances du modèle. Nous montrons à travers ces expériences que les résultats sont positifs
This thesis deals with natural language processing and text mining, at the intersection of machine learning and statistics. We are particularly interested in Term Weighting Schemes (TWS) in the context of supervised learning and specifically the Text Classification (TC) task. In TC, the multi-label classification task has gained a lot of interest in recent years. Multi-label classification from textual data may be found in many modern applications such as news classification where the task is to find the categories that a newswire story belongs to (e.g., politics, middle east, oil), based on its textual content, music genre classification (e.g., jazz, pop, oldies, traditional pop) based on customer reviews, film classification (e.g. action, crime, drama), product classification (e.g. Electronics, Computers, Accessories). Traditional classification algorithms are generally binary classifiers, and they are not suited for the multi-label classification. The multi-label classification task is, therefore, transformed into multiple single-label binary tasks. However, this transformation introduces several issues. First, terms distributions are only considered in relevance to the positive and the negative categories (i.e., information on the correlations between terms and categories is lost). Second, it fails to consider any label dependency (i.e., information on existing correlations between classes is lost). Finally, since all categories but one are grouped into one category (the negative category), the newly created tasks are imbalanced. This information is commonly used by supervised TWS to improve the effectiveness of the classification system. Hence, after presenting the process of multi-label text classification, and more particularly the TWS, we make an empirical comparison of these methods applied to the multi-label text classification task. We find that the superiority of the supervised methods over the unsupervised methods is still not clear. We show then that these methods are not fully adapted to the multi-label classification problem and they ignore much statistical information that coul be used to improve the classification results. Thus, we propose a new TWS based on information gain. This new method takes into consideration the term distribution, not only regarding the positive and the negative categories but also in relevance to all classes. Finally, aiming at finding specialized TWS that also solve the issue of imbalanced tasks, we studied the benefits of using genetic programming for generating TWS for the text classification task. Unlike previous studies, we generate formulas by combining statistical information at a microscopic level (e.g., the number of documents that contain a specific term) instead of using complete TWS. Furthermore, we make use of categorical information such as (e.g., the number of categories where a term occurs). Experiments are made to measure the impact of these methods on the performance of the model. We show through these experiments that the results are positive

Estilos ABNT, Harvard, Vancouver, APA, etc.

Bastos, Dos Santos José Eduardo. "L'identification de texte en images de chèques bancaires brésiliens". Compiègne, 2003. http://www.theses.fr/2003COMP1453.

Texto completo da fonte

Resumo:

L'identification et la distinction textuelle dans des images de documents sont des tâches dont les solutions actueles sont fortement basées sur l'emploi des informations contextuelles, comme par exemple des informations du layout ou bien de la structure physique. Dans ce travail on a exploité une option pour cette tâche basée uniquement sur des caractéristiques extraites exclusivement des elements textuels, ce qui accorde plus d'indépendance au procès. Le travail dans sa totalité a été développé en prenant compte des élements textuels fraccionés en petits échantillons de façon à proposer une alternative pour les questions concernant l'échelle et aussi la superposition. A partir de ces échantillons on extrait un ensemble de caractéristiques chargés de fournir les données d'entrée à um classifieur dont les tâches principales sont l'extraction du texte du document ainsi que la distinction entre texte manuscrit et texte imprimé. En outre, étant donné qu'on n'utilise que des informations extraites directement des élements textuels, le procès prend un caractère plus indépendant car il ne répose sur l'emploi d'aucune heuristique ou information à priori à propos du document traité. Des résultats dans l'ordre de 93% de classification correcte démontre l'éfficacité du procès
Identifying and distinguishing text in document images are tasks whose cat!Jal solutions are mainly based on using contextual informations, like layout informations or informations from the phisical structure. Ln this research work, an alternative for this task is investigated based only in features observed from textual elements, giving more independency to the process. The hole process was developped considering textual elements fragmented in sm ail portions(samples) in order to provide an alternative solution to questions Iike scale and textual elements overlapping. From these samples, a set of features is extracted and serves as input to a classifyer maily chrged with textual extraction from the document and also the distinguish between handwritting and machine-printed text. Moreover, sinGe the only informations emplyed is observed directly from textual elements, the process assumes a character more independent as it doesn't use any heuristics nor à priori information of the treated document. Results around 93% of correct classification confirms the efficacy of the process

Estilos ABNT, Harvard, Vancouver, APA, etc.

Mais fontes

Livros sobre o assunto "Classification texte"

Les mille et une nuits: Histoire du texte et classification des contes. Paris: L'Harmattan, 2008.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Anne-Elizabeth, Dalcq, ed. Mettre de l'ordre dans ses idees: Classification des articulations logiques pour structurer son texte. Paris: Duculot, 1999.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Les parenthèses dans l'Evangile de Jean: Aperçu historique et classification, texte grec de Jean. Leuven: University Press, 1985.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Töppe, Frank. Im Zeichen der drei goldenen Haare: Der Teufel mit den drei goldenen Haaren, der Vogel Greif : die Beziehungen beider Texte zueinander, zu den entsprechenden Texten der Erstauflage der Grimmschen Märchen und einer F. v. Arnimschen Überlieferung. Meerbusch-Büderich bei Düsseldorf: Edition Vogelmann, 1993.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Descartes, René. Discours de la méthode: Texte intégral. Paris: Hatier, 1990.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Quintilian. De institutione oratoria, liber primus: Texte latin publié avec des notes biographiques sur Quintilien, l'histoire de l'institution oratoire et de ses abrégés, la classification et la description des manuscrits, le texte abrégé par Étienne de Rouen et par Jean Racine. Des notes critiques les variantes principales par Ch. Fierville. Paris: Firmin-Didot, 1991.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Genome clustering: From linguistic models to classification of genetic texts. Berlin: Springer, 2010.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Support vector machines for pattern classification. 2^a ed. London: Springer, 2010.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Almodóvar, Antonio Rodríguez. El texto infinito: Ensayos sobre el cuento popular. Madrid: Fundación Germán Sánchez Ruipérez, 2004.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Los cuentos populares o la tentativa de un texto infinito. [Murcia]: Universidad de Murcia, 1989.

Encontre o texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Mais fontes

Capítulos de livros sobre o assunto "Classification texte"

Féron, Corinne. "Classification des adverbiaux du moyen français: l’exemple des expressions formées sur vérité, voir, vrai et certain". In Texte, Codex & Contexte, 123–33. Turnhout: Brepols Publishers, 2007. http://dx.doi.org/10.1484/m.tcc-eb.3.3946.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Joachims, Thorsten. "Text Classification". In Learning to Classify Text Using Support Vector Machines, 7–33. Boston, MA: Springer US, 2002. http://dx.doi.org/10.1007/978-1-4615-0907-3_2.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Sarkar, Dipanjan. "Text Classification". In Text Analytics with Python, 167–215. Berkeley, CA: Apress, 2016. http://dx.doi.org/10.1007/978-1-4842-2388-8_4.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Ali-Ahmed, Syed Toufeeq. "Text Classification". In Encyclopedia of Systems Biology, 2156. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4419-9863-7_174.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Sarkar, Dipanjan. "Text Classification". In Text Analytics with Python, 275–342. Berkeley, CA: Apress, 2019. http://dx.doi.org/10.1007/978-1-4842-4354-1_5.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Sahin, Özgür. "Text Classification". In Develop Intelligent iOS Apps with Swift, 41–67. Berkeley, CA: Apress, 2020. http://dx.doi.org/10.1007/978-1-4842-6421-8_3.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Zong, Chengqing, Rui Xia e Jiajun Zhang. "Text Classification". In Text Data Mining, 93–124. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-0100-2_5.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Wasserman, Larry. "Classification". In Springer Texts in Statistics, 349–79. New York, NY: Springer New York, 2004. http://dx.doi.org/10.1007/978-0-387-21736-9_22.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

James, Gareth, Daniela Witten, Trevor Hastie e Robert Tibshirani. "Classification". In Springer Texts in Statistics, 127–73. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4614-7138-7_4.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

James, Gareth, Daniela Witten, Trevor Hastie e Robert Tibshirani. "Classification". In Springer Texts in Statistics, 129–95. New York, NY: Springer US, 2021. http://dx.doi.org/10.1007/978-1-0716-1418-1_4.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Trabalhos de conferências sobre o assunto "Classification texte"

Yang, Yi, Hongan Wang, Jiaqi Zhu, Yunkun Wu, Kailong Jiang, Wenli Guo e Wandong Shi. "Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings". In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/549.

Texto completo da fonte

Resumo:

Dataless text classification has attracted increasing attentions recently. It only needs very few seed words of each category to classify documents, which is much cheaper than supervised text classification that requires massive labeling efforts. However, most of existing models pay attention to long texts, but get unsatisfactory performance on short texts, which have become increasingly popular on the Internet. In this paper, we at first propose a novel model named Seeded Biterm Topic Model (SeedBTM) extending BTM to solve the problem of dataless short text classification with seed words. It takes advantage of both word co-occurrence information in the topic model and category-word similarity from widely used word embeddings as the prior topic-in-set knowledge. Moreover, with the same approach, we also propose Seeded Twitter Biterm Topic Model (SeedTBTM), which extends Twitter-BTM and utilizes additional user information to achieve higher classification accuracy. Experimental results on five real short-text datasets show that our models outperform the state-of-the-art methods, and especially perform well when the categories are overlapping and interrelated.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Kavi, Deniz. "Towards Adversarial Genetic Text Generation". In 8th International Conference on Computer Science and Information Technology (CoSIT 2021). AIRCC Publishing Corporation, 2021. http://dx.doi.org/10.5121/csit.2021.110407.

Texto completo da fonte

Resumo:

Text generation is the task of generating natural language, and producing outputs similar to or better than human texts. Due to deep learning’s recent success in the field of natural language processing, computer generated text has come closer to becoming indistinguishable to human writing. Genetic Algorithms have not been as popular in the field of text generation. We propose a genetic algorithm combined with text classification and clustering models which automatically grade the texts generated by the genetic algorithm. The genetic algorithm is given poorly generated texts from a Markov chain, these texts are then graded by a text classifier and a text clustering model. We then apply crossover to pairs of texts, with emphasis on those that received higher grades. Changes to the grading system and further improvements to the genetic algorithm are to be the focus of future research.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Souza, Luiz Fernando Spillere de, e Alexandre Leopoldo Gonçalves. "UTILIZAÇÃO PRÁTICA DE WORD EMBEDDING APLICADA À CLASSIFICAÇÃO DE TEXTO". In Congresso Internacional de Conhecimento e Inovação (ciKi). Congresso Internacional de Conhecimento e Inovação (ciKi), 2020. http://dx.doi.org/10.48090/ciki.v1i1.899.

Texto completo da fonte

Resumo:

Text classification aims to extract knowledge from unstructured text patterns. The concept of word incorporation is a representation technique that allows words with similar meanings to have a similar representation, in order to incorporate reasoning characteristics about their use and meaning. The aim of this article is to analyze the work already published on the use of embedded words applied to the classification of texts, to propose a practical application that demonstrates its effectiveness. This study contributes to proving the effectiveness of the use of word incorporation applied to text classification, having reached an accuracy rate of around 73%.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Huang, Ting, Gehui Shen e Zhi-Hong Deng. "Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization". In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/697.

Texto completo da fonte

Resumo:

Recurrent Neural Networks (RNNs) are widely used in the field of natural language processing (NLP), ranging from text categorization to question answering and machine translation. However, RNNs generally read the whole text from beginning to end or vice versa sometimes, which makes it inefficient to process long texts. When reading a long document for a categorization task, such as topic categorization, large quantities of words are irrelevant and can be skipped. To this end, we propose Leap-LSTM, an LSTM-enhanced model which dynamically leaps between words while reading texts. At each step, we utilize several feature encoders to extract messages from preceding texts, following texts and the current word, and then determine whether to skip the current word. We evaluate Leap-LSTM on several text categorization tasks: sentiment analysis, news categorization, ontology classification and topic classification, with five benchmark data sets. The experimental results show that our model reads faster and predicts better than standard LSTM. Compared to previous models which can also skip words, our model achieves better trade-offs between performance and efficiency.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Chaika, M., I. Buneev e V. Velichko. "ANALYSIS OF THE EFFECTIVENESS OF APPLYING THE FREQUENCY-CONTEXT CLASSIFICATION ALGORITHM TO TEXTS OF DIFFERENT STYLES". In Modern aspects of modeling systems and processes. FSBE Institution of Higher Education Voronezh State University of Forestry and Technologies named after G.F. Morozov, 2021. http://dx.doi.org/10.34220/mamsp_174-178.

Texto completo da fonte

Resumo:

This article discusses the application of the frequency-context classification algorithm to texts of various styles. The main features of different styles that affect the efficiency of the algo-rithm are highlighted. It is proved that the method of selecting the subject of the text using the fre-quency-context classification algorithm works best in relation to scientific and legal documents and, in its current form, is practically inapplicable for literary texts. This makes the task of modifying the algorithm to determine the subject of literary texts relevant.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Ouyang, Jihong, Yiming Wang, Ximing Li e Changchun Li. "Weakly-supervised Text Classification with Wasserstein Barycenters Regularization". In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/468.

Texto completo da fonte

Resumo:

Weakly-supervised text classification aims to train predictive models with unlabeled texts and a few representative words of classes, referred to as category words, rather than labeled texts. These weak supervisions are much more cheaper and easy to collect in real-world scenarios. To resolve this task, we propose a novel deep classification model, namely Weakly-supervised Text Classification with Wasserstein Barycenter Regularization (WTC-WBR). Specifically, we initialize the pseudo-labels of texts by using the category word occurrences, and formulate a weakly self-training framework to iteratively update the weakly-supervised targets by combining the pseudo-labels with the sharpened predictions. Most importantly, we suggest a Wasserstein barycenter regularization with the weakly-supervised targets on the deep feature space. The intuition is that the texts tend to be close to the corresponding Wasserstein barycenter indicated by weakly-supervised targets. Another benefit is that the regularization can capture the geometric information of deep feature space to boost the discriminative power of deep features. Experimental results demonstrate that WTC-WBR outperforms the existing weakly-supervised baselines, and achieves comparable performance to semi-supervised and supervised baselines.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Fontanille, Jacques. "La sémiotique est-elle un art ? Le faire sémiotique comme « art libéral »". In Arts du faire : production et expertise. Limoges: Université de Limoges, 2009. http://dx.doi.org/10.25965/as.3343.

Texto completo da fonte

Resumo:

En référence à la classification médiévale des activités et domaines culturels, la sémiotique, saisie principalement comme un « faire », serait un des « arts libéraux » contemporains, c’est-à-dire, selon l’acception courante, « ceux dans lesquels le travail intellectuel est dominant », ce en quoi ils s’opposent par exemple aux « arts mécaniques » ou aux « beaux-arts », qui mettent en œuvre d’autres facultés dominantes. Dans cette perspective, bien entendu, la sémiotique perd son caractère de « projet scientifique », au sens où l’entendait Greimas, c’est-à-dire de connaissance généralisable, projective, construite par voie hypothético-déductive, reposant sur une théorie, des modèles et des méthodes empiriques. Mais elle ne le perd pas plus, pour autant, que la médecine, quand cette dernière passe de la recherche dite « in vitro » à la recherche dite « clinique » ; tout comme pour la médecine, en effet, il s’agit du passage d’une science fondamentale à une « pratique scientifique ». La sémiotique considérée comme un art est donc une pratique, où l’intelligence, la sensibilité, l’émotion et le goût ont également part. Et, tout comme la médecine encore, c’est une pratique dont le « texte » est un discours scientifique. C’est donc sur le fond de cette problématique générale qu’après avoir circonscrit le faire sémiotique comme « art » et comme « pratique », je voudrais ébaucher la description de quelques pratiques sémiotiques typiques : celles, notamment, de Jean-Marie Floch, d’Eric Landowski, d’Algirdas Julien Greimas ou de Claude Zilberberg.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Shamardina, Tatiana, Vladislav Mikhailov, Daniil Chernianskii, Alena Fenogenova, Marat Saidov, Anastasiya Valeeva, Tatiana Shavrina, Ivan Smurov, Elena Tutubalina e Ekaterina Artemova. "Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian". In Dialogue. RSUH, 2022. http://dx.doi.org/10.28995/2075-7182-2022-21-497-511.

Texto completo da fonte

Resumo:

We present the shared task on artificial text detection in Russian, which is organized as a part of the Dialogue Evaluation initiative, held in 2022. The shared task dataset includes texts from 14 text generators, i.e., one human writer and 13 text generative models fine-tuned for one or more of the following generation tasks: machine translation, paraphrase generation, text summarization, text simplification. We also consider back-translation and zero-shot generation approaches. The human-written texts are collected from publicly available resources across multiple domains. The shared task consists of two sub-tasks: (i) to determine if a given text is automatically generated or written by a human; (ii) to identify the author of a given text. The first task is framed as a binary classification problem. The second task is a multi-class classification problem. We provide count-based and BERT-based baselines, along with the human evaluation on the first sub-task. A total of 30 and 8 systems have been submitted to the binary and multi-class sub-tasks, correspondingly. Most teams outperform the baselines by a wide margin. We publicly release our codebase, human evaluation results, and other materials in our GitHub repository.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Chen, Jun, Quan Yuan, Chao Lu e Haifeng Huang. "A Novel Sequence-to-Subgraph Framework for Diagnosis Classification". In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/496.

Texto completo da fonte

Resumo:

Text-based diagnosis classification is a critical problem in AI-enabled healthcare studies, which assists clinicians in making correct decision and lowering the rate of diagnostic errors. Previous studies follow the routine of sequence based deep learning models in NLP literature to deal with clinical notes. However, recent studies find that structural information is important in clinical contents that greatly impacts the predictions. In this paper, a novel sequence-to-subgraph framework is introduced to process clinical texts for classification, which changes the paradigm of managing texts. Moreover, a new classification model under the framework is proposed that incorporates subgraph convolutional network and hierarchical diagnostic attentive network to extract the layered structural features of clinical texts. The evaluation conducted on both the real-world English and Chinese datasets shows that the proposed method outperforms the state-of-the-art deep learning based diagnosis classification models.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Subramaniam, Raghav. "Examining Accuracy Heterogeneities in Classification of Multilingual". In 8th International Conference on Software Engineering. Academy & Industry Research Collaboration, 2023. http://dx.doi.org/10.5121/csit.2023.131221.

Texto completo da fonte

Resumo:

Tools for detection of AI-generated texts are used globally, however, the nature of the apparent accuracy disparities between languages must be further observed. This paper aims to examine the nature of these differences through testing OpenAIâ€™s â€œAI Text Classifierâ€ on a set of various AI and human-generated texts in English, Swahili, German, Arabic, Chinese, and Hindi. Current tools for detecting AI-generated text are already fairly easy to discredit, as misclassifications have shown to be fairly common, but such vulnerabilities often persist in slightly different ways when non-English languages are observed: classification of human-written text as AI-generated and vice versa could occur more frequently in specific language environments than others. Our findings indicate that false positives are more likely to occur in Hindi and Arabic, whereas false negative labelings are more likely to occur in English. Other languages tested had a tendency to not be confidently labeled at all.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Relatórios de organizações sobre o assunto "Classification texte"

Furey, John, Austin Davis e Jennifer Seiter-Moser. Natural language indexing for pedoinformatics. Engineer Research and Development Center (U.S.), setembro de 2021. http://dx.doi.org/10.21079/11681/41960.

Texto completo da fonte

Resumo:

The multiple schema for the classification of soils rely on differing criteria but the major soil science systems, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources soil classification systems, are primarily based on inferred pedogenesis. Largely these classifications are compiled from individual observations of soil characteristics within soil profiles, and the vast majority of this pedologic information is contained in nonquantitative text descriptions. We present initial text mining analyses of parsed text in the digitally available USDA soil taxonomy documentation and the Soil Survey Geographic database. Previous research has shown that latent information structure can be extracted from scientific literature using Natural Language Processing techniques, and we show that this latent information can be used to expedite query performance by using syntactic elements and part-of-speech tags as indices. Technical vocabulary often poses a text mining challenge due to the rarity of its diction in the broader context. We introduce an extension to the common English vocabulary that allows for nearly-complete indexing of USDA Soil Series Descriptions.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Seo, Young-Woo, Joseph Giampapa e Katia Sycara. Text Classification for Intelligent Portfolio Management. Fort Belvoir, VA: Defense Technical Information Center, maio de 2002. http://dx.doi.org/10.21236/ada595830.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Dasigi, V. R., R. C. Mann e V. Protopopescu. Multi-sensor text classification experiments -- a comparison. Office of Scientific and Technical Information (OSTI), janeiro de 1997. http://dx.doi.org/10.2172/638201.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Han, Euihong, George Karypis e Vipin Kumar. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Fort Belvoir, VA: Defense Technical Information Center, maio de 1999. http://dx.doi.org/10.21236/ada439688.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Kuzmina, Aleksandra, Amalia Kuregyan e Ekaterina Pertsevaya. PSUDOINTERNATIONAL WORDS IN THE TRANSLATION OF ECONOMIC TEXTS CARRIED OUT BY THE STUDENTS OF NON-LINGUISTIC UNIVERSITIES. Crimean Federal University named after V.I. Vernadsky, 2023. http://dx.doi.org/10.12731/ttxnbz.

Texto completo da fonte

Resumo:

The article deals with the problems of translating pseudo-international words in economic texts. Incorrect interpretations of pseudo-international words in written texts and oral translations are investigated. It is noted that errors in the written version appear mainly due to the use of the most common full-text translation services, where the word spelling is a priority. For oral translation, the first variant of incorrect interpretation is more typical, when the word is pronounced similarly to Russian, but is not its analogue. The paper presents the classification of pseudo-international words according to the parts of speech: noun, adjective, verb and adverb, and also provides typical mistakes that students make when translating this vocabulary. The authors of the article also present tasks that are the most effective way to overcome misinterpretations of words related to pseudo-internationalisms.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Chew, Robert F., Kirsty J. Weitzel, Peter Baumgartner, Caroline W. Oppenheimer, Brianna D'Arcangelo, Autumn Barnes, Shirley Liu, Adam Bryant Miller, Ashley Lowe e Anna C. Yaros. Improving Text Classification with Boolean Retrieval for Rare Categories: A Case Study Identifying Firearm Violence Conversations in the Crisis Text Line Database. RTI Press, março de 2023. http://dx.doi.org/10.3768/rtipress.2023.mr.0050.2304.

Texto completo da fonte

Resumo:

Advancements in machine learning and natural language processing have made text classification increasingly attractive for information retrieval. However, developing text classifiers is challenging when no prior labeled data are available for a rare category of interest. Finding instances of the rare class using a uniform random sample can be inefficient and costly due to the rare category’s low base rate. This work presents an approach that combines the strengths of text classification and Boolean retrieval to help learn rare concepts of interest. As a motivating example, we use the task of finding conversations that reference firearm injury or violence in the Crisis Text Line database. Identifying rare categories, like firearm injury or violence, can improve crisis lines' abilities to support people with firearm-related crises or provide appropriate resources. Our approach outperforms a set of iteratively refined Boolean queries and results in a recall of 0.91 on a test set generated from a process independent of our study. Our results suggest that text classification with Boolean retrieval initialization can be effective for finding rare categories of interest and improve on the precision of using Boolean retrieval alone.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Dasigi, V. R., e R. C. Mann. Toward a multi-sensor-based approach to automatic text classification. Office of Scientific and Technical Information (OSTI), outubro de 1995. http://dx.doi.org/10.2172/130610.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Dewdney, Nigel, Carol VanEss-Dykema e Richard MacMillan. The Form is the Substance: Classification of Genres in Text. Fort Belvoir, VA: Defense Technical Information Center, janeiro de 2001. http://dx.doi.org/10.21236/ada460898.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Pavlicheva, E. N., V. P. Meshalkin e N. S. CHikunov. Algorithm for processing text data for an automatic classification problem using the word2vec method. OFERNIO, fevereiro de 2021. http://dx.doi.org/10.12731/ofernio.2021.24759.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

KNYAZEVA, V., A. BILYALOVA e E. IBRAGIMOVA. INTERTEXT AS A LEXICAL AND SEMANTIC TOOL OF SUGGESTION. Science and Innovation Center Publishing House, 2022. http://dx.doi.org/10.12731/2077-1770-2022-14-2-3-39-49.

Texto completo da fonte

Resumo:

An article describes intertextuality as a lexico-semantic tool of linguistic suggestion and examines its ability to constitute manipulative power of authority within political media discourse. Following a thorough study of linguopragmatics and suggestive linguistics from the perspective of their theoretical grounds, we aimed to classify lexico-semantic tools, which could enable an authority to become a manipulative power of political media texts. Intertextuality caught our attention as an element of the aforementioned classification. The phenomenon representing overlap and interaction of several texts is backed up by recent examples gathered from some Russian and foreign Internet periodicals. Being sub-types of intertextuality Allusion and Quotation were highlighted in the research.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!