Tesis sobre el tema "Classification texte"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Classification texte".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Tisserant, Guillaume. "Généralisation de données textuelles adaptée à la classification automatique". Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS231/document.
Texto completoWe have work for a long time on the classification of text. Early on, many documents of different types were grouped in order to centralize knowledge. Classification and indexing systems were then created. They make it easy to find documents based on readers' needs. With the increasing number of documents and the appearance of computers and the internet, the implementation of text classification systems becomes a critical issue. However, textual data, complex and rich nature, are difficult to treat automatically. In this context, this thesis proposes an original methodology to organize and facilitate the access to textual information. Our automatic classification approache and our semantic information extraction enable us to find quickly a relevant information.Specifically, this manuscript presents new forms of text representation facilitating their processing for automatic classification. A partial generalization of textual data (GenDesc approach) based on statistical and morphosyntactic criteria is proposed. Moreover, this thesis focuses on the phrases construction and on the use of semantic information to improve the representation of documents. We will demonstrate through numerous experiments the relevance and genericity of our proposals improved they improve classification results.Finally, as social networks are in strong development, a method of automatic generation of semantic Hashtags is proposed. Our approach is based on statistical measures, semantic resources and the use of syntactic information. The generated Hashtags can then be exploited for information retrieval tasks from large volumes of data
Danuser, Hermann. "Der Text und die Texte. Über Singularisierung und Pluralisierung einer Kategorie". Bärenreiter Verlag, 1998. https://slub.qucosa.de/id/qucosa%3A36795.
Texto completoVasil'eva, Natalija. "Eigennamen in der Welt zeitgenössischer Texte". Gesellschaft für Namenkunde e.V, 2007. https://ul.qucosa.de/id/qucosa%3A31517.
Texto completoLasch, Alexander. "Texte im Handlungsbereich der Religion". De Gruyter, 2011. https://tud.qucosa.de/id/qucosa%3A74840.
Texto completoSayadi, Karim. "Classification du texte numérique et numérisé. Approche fondée sur les algorithmes d'apprentissage automatique". Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066079/document.
Texto completoDifferent disciplines in the humanities, such as philology or palaeography, face complex and time-consuming tasks whenever it comes to examining the data sources. The introduction of computational approaches in humanities makes it possible to address issues such as semantic analysis and systematic archiving. The conceptual models developed are based on algorithms that are later hard coded in order to automate these tedious tasks. In the first part of the thesis we propose a novel method to build a semantic space based on topics modeling. In the second part and in order to classify historical documents according to their script. We propose a novel representation learning method based on stacking convolutional auto-encoder. The goal is to automatically learn plot representations of the script or the written language
Schneider, Ulrich Johannes. "Über Tempel und Texte: ein Bildervergleich". Fink, 1999. https://ul.qucosa.de/id/qucosa%3A12768.
Texto completoAKAMA, HIROYUKI. "Tableau, corps, texte : etudes historiques sur la classification-recit en france au xixe siecle". Paris 1, 1992. http://www.theses.fr/1992PA010604.
Texto completoThrough the "story" named ideology (of cabanis and tracy), the history of nineteenth-century "classification" can be divided into three distinctive stages having their own means to embody the complex of "table-body-text" (tableau-corps-texte). Primarily, an epistemological rupture of the "table" (tableau) which was a matter for the element of ideology, and in consequence, the phenomena of "inner dia-textuality" suppressing the possibilities of this current of thought. Secondly, the appearance of some cone-shaped encyclopedic spaces of knowledges, and their transformations to materialize the "positivist" significations of sociology and anthropology. Thirdly, another rupture of the "table" (tableau), as a result of which the spaces of knowledges became fluid to be symbols of anti-positivist crisis of science, and finally a tripartite unity of "discources" (discours) emerged : hysteria, science-fiction and institute-university
Felhi, Mehdi. "Document image segmentation : content categorization". Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0109/document.
Texto completoIn this thesis I discuss the document image segmentation problem and I describe our new approaches for detecting and classifying document contents. First, I discuss our skew angle estimation approach. The aim of this approach is to develop an automatic approach able to estimate, with precision, the skew angle of text in document images. Our method is based on Maximum Gradient Difference (MGD) and R-signature. Then, I describe our second method based on Ridgelet transform.Our second contribution consists in a new hybrid page segmentation approach. I first describe our stroke-based descriptor that allows detecting text and line candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. Finally, text candidates are clustered using mean-shift analysis technique according to their corresponding sizes. The method is applied for segmenting scanned document images (newspapers and magazines) that contain text, lines and photo regions. Finally, I describe our stroke-based text extraction method. Our approach begins by extracting connected components and selecting text character candidates over the CIE LCH color space using the Histogram of Oriented Gradients (HOG) correlation coefficients in order to detect low contrasted regions. The text region candidates are clustered using two different approaches ; a depth first search approach over a graph, and a stable text line criterion. Finally, the resulted regions are refined by classifying the text line candidates into « text» and « non-text » regions using a Kernel Support Vector Machine K-SVM classifier
Mazyad, Ahmad. "Contribution to automatic text classification : metrics and evolutionary algorithms". Thesis, Littoral, 2018. http://www.theses.fr/2018DUNK0487/document.
Texto completoThis thesis deals with natural language processing and text mining, at the intersection of machine learning and statistics. We are particularly interested in Term Weighting Schemes (TWS) in the context of supervised learning and specifically the Text Classification (TC) task. In TC, the multi-label classification task has gained a lot of interest in recent years. Multi-label classification from textual data may be found in many modern applications such as news classification where the task is to find the categories that a newswire story belongs to (e.g., politics, middle east, oil), based on its textual content, music genre classification (e.g., jazz, pop, oldies, traditional pop) based on customer reviews, film classification (e.g. action, crime, drama), product classification (e.g. Electronics, Computers, Accessories). Traditional classification algorithms are generally binary classifiers, and they are not suited for the multi-label classification. The multi-label classification task is, therefore, transformed into multiple single-label binary tasks. However, this transformation introduces several issues. First, terms distributions are only considered in relevance to the positive and the negative categories (i.e., information on the correlations between terms and categories is lost). Second, it fails to consider any label dependency (i.e., information on existing correlations between classes is lost). Finally, since all categories but one are grouped into one category (the negative category), the newly created tasks are imbalanced. This information is commonly used by supervised TWS to improve the effectiveness of the classification system. Hence, after presenting the process of multi-label text classification, and more particularly the TWS, we make an empirical comparison of these methods applied to the multi-label text classification task. We find that the superiority of the supervised methods over the unsupervised methods is still not clear. We show then that these methods are not fully adapted to the multi-label classification problem and they ignore much statistical information that coul be used to improve the classification results. Thus, we propose a new TWS based on information gain. This new method takes into consideration the term distribution, not only regarding the positive and the negative categories but also in relevance to all classes. Finally, aiming at finding specialized TWS that also solve the issue of imbalanced tasks, we studied the benefits of using genetic programming for generating TWS for the text classification task. Unlike previous studies, we generate formulas by combining statistical information at a microscopic level (e.g., the number of documents that contain a specific term) instead of using complete TWS. Furthermore, we make use of categorical information such as (e.g., the number of categories where a term occurs). Experiments are made to measure the impact of these methods on the performance of the model. We show through these experiments that the results are positive
Bastos, Dos Santos José Eduardo. "L'identification de texte en images de chèques bancaires brésiliens". Compiègne, 2003. http://www.theses.fr/2003COMP1453.
Texto completoIdentifying and distinguishing text in document images are tasks whose cat!Jal solutions are mainly based on using contextual informations, like layout informations or informations from the phisical structure. Ln this research work, an alternative for this task is investigated based only in features observed from textual elements, giving more independency to the process. The hole process was developped considering textual elements fragmented in sm ail portions(samples) in order to provide an alternative solution to questions Iike scale and textual elements overlapping. From these samples, a set of features is extracted and serves as input to a classifyer maily chrged with textual extraction from the document and also the distinguish between handwritting and machine-printed text. Moreover, sinGe the only informations emplyed is observed directly from textual elements, the process assumes a character more independent as it doesn't use any heuristics nor à priori information of the treated document. Results around 93% of correct classification confirms the efficacy of the process
Dalloux, Clément. "Fouille de texte et extraction d'informations dans les données cliniques". Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S050.
Texto completoWith the introduction of clinical data warehouses, more and more health data are available for research purposes. While a significant part of these data exist in structured form, much of the information contained in electronic health records is available in free text form that can be used for many tasks. In this manuscript, two tasks are explored: the multi-label classification of clinical texts and the detection of negation and uncertainty. The first is studied in cooperation with the Rennes University Hospital, owner of the clinical texts that we use, while, for the second, we use publicly available biomedical texts that we annotate and release free of charge. In order to solve these tasks, we propose several approaches based mainly on deep learning algorithms, used in supervised and unsupervised learning situations
Janus, Wolfgang. "Texte barrierefrei gestalten – Leichte Sprache und die Annäherung zum Themenfeld jüdisches Leben". HATiKVA e.V. – Die Hoffnung Bildungs- und Begegnungsstätte für Jüdische Geschichte und Kultur Sachsen, 2016. https://slub.qucosa.de/id/qucosa%3A34837.
Texto completoTrinh, Anh Phuc. "Classifieur probabiliste et séparateur à vaste marge : application à la classification de texte et à l'étiquetage d'image". Paris 6, 2012. http://www.theses.fr/2012PA066060.
Texto completoSeidel, Wilhelm. "Schreiben im Diskurs. Über Form und Inhalt musikästhetischer Texte des 18. Jahrhunderts". Bärenreiter Verlag, 1998. https://slub.qucosa.de/id/qucosa%3A36823.
Texto completoKempke, Matthias. "Fotos und Texte von der Visitationsreise des Leipziger Missionsdirektors Carl Ihmels nach Tanganyika: 1927". Universität Leipzig, 2006. https://ul.qucosa.de/id/qucosa%3A34425.
Texto completoFell, Michael. "Traitement automatique des langues pour la recherche d'information musicale : analyse profonde de la structure et du contenu des paroles de chansons". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4017.
Texto completoApplications in Music Information Retrieval and Computational Musicology have traditionally relied on features extracted from the music content in the form of audio, but mostly ignored the song lyrics. More recently, improvements in fields such as music recommendation have been made by taking into account external metadata related to the song. In this thesis, we argue that extracting knowledge from the song lyrics is the next step to improve the user’s experience when interacting with music. To extract knowledge from vast amounts of song lyrics, we show for different textual aspects (their structure, content and perception) how Natural Language Processing methods can be adapted and successfully applied to lyrics. For the structuralaspect of lyrics, we derive a structural description of it by introducing a model that efficiently segments the lyricsinto its characteristic parts (e.g. intro, verse, chorus). In a second stage, we represent the content of lyrics by meansof summarizing the lyrics in a way that respects the characteristic lyrics structure. Finally, on the perception of lyricswe investigate the problem of detecting explicit content in a song text. This task proves to be very hard and we showthat the difficulty partially arises from the subjective nature of perceiving lyrics in one way or another depending onthe context. Furthermore, we touch on another problem of lyrics perception by presenting our preliminary resultson Emotion Recognition. As a result, during the course of this thesis we have created the annotated WASABI SongCorpus, a dataset of two million songs with NLP lyrics annotations on various levels
Wei, Zhihua. "The research on chinese text multi-label classification". Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20025/document.
Texto completoLa thèse est centrée sur la Classification de texte, domaine en pleine expansion, avec de nombreuses applications actuelles et potentielles. Les apports principaux de la thèse portent sur deux points : Les spécificités du codage et du traitement automatique de la langue chinoise : mots pouvant être composés de un, deux ou trois caractères ; absence de séparation typographique entre les mots ; grand nombre d’ordres possibles entre les mots d’une phrase ; tout ceci aboutissant à des problèmes difficiles d’ambiguïté. La solution du codage en «n-grams »(suite de n=1, ou 2 ou 3 caractères) est particulièrement adaptée à la langue chinoise, car elle est rapide et ne nécessite pas les étapes préalables de reconnaissance des mots à l’aide d’un dictionnaire, ni leur séparation. La classification multi-labels, c'est-à-dire quand chaque individus peut être affecté à une ou plusieurs classes. Dans le cas des textes, on cherche des classes qui correspondent à des thèmes (topics) ; un même texte pouvant être rattaché à un ou plusieurs thème. Cette approche multilabel est plus générale : un même patient peut être atteint de plusieurs pathologies ; une même entreprise peut être active dans plusieurs secteurs industriels ou de services. La thèse analyse ces problèmes et tente de leur apporter des solutions, d’abord pour les classifieurs unilabels, puis multi-labels. Parmi les difficultés, la définition des variables caractérisant les textes, leur grand nombre, le traitement des tableaux creux (beaucoup de zéros dans la matrice croisant les textes et les descripteurs), et les performances relativement mauvaises des classifieurs multi-classes habituels
文本分类是信息科学中一个重要而且富有实际应用价值的研究领域。随着文本分类处理内容日趋复杂化和多元化,分类目标也逐渐多样化,研究有效的、切合实际应用需求的文本分类技术成为一个很有挑战性的任务,对多标签分类的研究应运而生。本文在对大量的单标签和多标签文本分类算法进行分析和研究的基础上,针对文本表示中特征高维问题、数据稀疏问题和多标签分类中分类复杂度高而精度低的问题,从不同的角度尝试运用粗糙集理论加以解决,提出了相应的算法,主要包括:针对n-gram作为中文文本特征时带来的维数灾难问题,提出了两步特征选择的方法,即去除类内稀有特征和类间特征选择相结合的方法,并就n-gram作为特征时的n值选取、特征权重的选择和特征相关性等问题在大规模中文语料库上进行了大量的实验,得出一些有用的结论。针对文本分类中运用高维特征表示文本带来的分类效率低,开销大等问题,提出了基于LDA模型的多标签文本分类算法,利用LDA模型提取的主题作为文本特征,构建高效的分类器。在PT3多标签分类转换方法下,该分类算法在中英文数据集上都表现出很好的效果,与目前公认最好的多标签分类方法效果相当。针对LDA模型现有平滑策略的随意性和武断性的缺点,提出了基于容差粗糙集的LDA语言模型平滑策略。该平滑策略首先在全局词表上构造词的容差类,再根据容差类中词的频率为每类文档的未登录词赋予平滑值。在中英文、平衡和不平衡语料库上的大量实验都表明该平滑方法显著提高了LDA模型的分类性能,在不平衡语料库上的提高尤其明显。针对多标签分类中分类复杂度高而精度低的问题,提出了一种基于可变精度粗糙集的复合多标签文本分类框架,该框架通过可变精度粗糙集方法划分文本特征空间,进而将多标签分类问题分解为若干个两类单标签分类问题和若干个标签数减少了的多标签分类问题。即,当一篇未知文本被划分到某一类文本的下近似区域时,可以直接用简单的单标签文本分类器判断其类别;当未知文本被划分在边界域时,则采用相应区域的多标签分类器进行分类。实验表明,这种分类框架下,分类的精确度和算法效率都有较大的提高。本文还设计和实现了一个基于多标签分类的网页搜索结果可视化系统(MLWC),该系统能够直接调用搜索引擎返回的搜索结果,并采用改进的Naïve Bayes多标签分类算法实现实时的搜索结果分类,使用户可以快速地定位搜索结果中感兴趣的文本。
Moreno, Villanueva José Antonio. "El 'Essai sur l'électricité des corps' (1746) de Jean-Antoine Nollet: Primer texte sobre física eléctrica traducido al espaniol". Universität Leipzig, 1997. https://ul.qucosa.de/id/qucosa%3A33052.
Texto completoMackert, Christoph. "„Musica est ars ex septem liberalibus una: Musiktheoretische Texte in mittelalterlichen Handschriften aus Leipziger Universitätsgebrauch". Verlag Janos Stekovics, 2010. https://ul.qucosa.de/id/qucosa%3A75002.
Texto completoLasch, Alexander. "Sind serielle Texte ein Gegenstand linguistischer Diskursanalyse?: Zu diskursbestätigenden und diskursverändernden ‚Lebensbeschreibungen‘ in rituellen Kontexten". Springer, 2013. https://tud.qucosa.de/id/qucosa%3A74898.
Texto completoDaunoravičienė, Gražina. ""Baltos lankos": Texte und Interpretationen - Almanach der Musiksemiotik, Vilnius (Baltos lankos), 1997, 246 S. (litauisch) [Rezension]". Musikgeschichte in Mittel- und Osteuropa ; 5 (1999), S. 205-210, 1999. https://ul.qucosa.de/id/qucosa%3A15668.
Texto completoNurse, Derek. "Historical texts from the Swahili coast". Swahili Forum 1 (1994) S. 47-85, 1994. https://ul.qucosa.de/id/qucosa%3A11607.
Texto completoGasser, Wolfgang. "„Das Ende (m)einer Kindheit?“: Wissenschaft und Selbstbezüge – Jugendliche analysieren Texte und Video-Interviews zu Kindertransporten". HATiKVA e.V. – Die Hoffnung Bildungs- und Begegnungsstätte für Jüdische Geschichte und Kultur Sachsen, 2015. https://slub.qucosa.de/id/qucosa%3A34939.
Texto completoGerhards, Simone y Simon Schweitzer. "Auf dem Weg zu einem TEI-Austauschformat für ägyptisch-koptische Texte". Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-201602.
Texto completoAlbitar, Shereen. "De l'usage de la sémantique dans la classification supervisée de textes : application au domaine médical". Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4343/document.
Texto completoThe main interest of this research is the effect of using semantics in the process of supervised text classification. This effect is evaluated through an experimental study on documents related to the medical domain using the UMLS (Unified Medical Language System) as a semantic resource. This evaluation follows four scenarios involving semantics at different steps of the classification process: the first scenario incorporates the conceptualization step where text is enriched with corresponding concepts from UMLS; both the second and the third scenarios concern enriching vectors that represent text as Bag of Concepts (BOC) with similar concepts; the last scenario considers using semantics during class prediction, where concepts as well as the relations between them are involved in decision making. We test the first scenario using three popular classification techniques: Rocchio, NB and SVM. We choose Rocchio for the other scenarios for its extendibility with semantics. According to experiment, results demonstrated significant improvement in classification performance using conceptualization before indexing. Moderate improvements are reported using conceptualized text representation with semantic enrichment after indexing or with semantic text-to-text semantic similarity measures for prediction
Krehl, Birgit. "Frühe lyrische Texte Julian Tuwims und der Große Krieg. „Sie schlagen Juden! Lustig! Ha-ha-ha!“". HATiKVA e.V. – Die Hoffnung Bildungs- und Begegnungsstätte für Jüdische Geschichte und Kultur Sachsen, 2016. https://slub.qucosa.de/id/qucosa%3A34822.
Texto completoBitterlich, Thomas. "Die Schrift der Zivilisation in Yasmina Rezas "Der Gott des Gemetzels"". Universitätsbibliothek Leipzig, 2014. http://www.kulturtechnik-schreiben.imz.uni-erlangen.de/veranstaltungen-texte/schreiben-im-theater.shtml.
Texto completoSüß, Ina. "Christus im Diskurs mit Muhammad - Das Ringen um religiöse Identität: Die Auseinandersetzung der syrischen Christen mit dem Islam anhand ausgewählter Texte des Johannes Damaskenos und des Theodor Abū Qurra". Master's thesis, Universitätsverlag der Technischen Universität Chemnitz, 2013. https://monarch.qucosa.de/id/qucosa%3A20186.
Texto completoReligion is an important component of her being for many people. They identify and define themselves over her affiliation to this. Besides, every competing world view is mostly looked as a menace and is fought more or less strongly in word, writing or action. Particularly the discussion with Islam has drastically intensified during the last years and leads over and over again to fierce verbal or violent attacks. Nevertheless, the struggle around notification or demarcation and the conflicts linked with it and discussions are not new, but stretch like a red thread through the history. Therefore, interesting from today's time is the development of the beginning debate in the place of origin of Islam. In which manner and with which means did the Christians immediately affected by the Arabian rule argue with the new religion? How did the argumentation patterns develop in the beginnings of the religious discourse? Which principal religious differences were perceived and picked out as a central theme? With the help of some texts of Johannes Damaskenos and Theodor Abū Qurra should become to these questions on the reason gone.
Mercadier, Yves. "Classification automatique de textes par réseaux de neurones profonds : application au domaine de la santé". Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS068.
Texto completoThis Ph.D focuses on the analysis of textual data in the health domain and in particular on the supervised multi-class classification of data from biomedical literature and social media.One of the major difficulties when exploring such data by supervised learning methods is to have a sufficient number of data sets for models training. Indeed, it is generally necessary to label manually the data before performing the learning step. The large size of the data sets makes this labellisation task very expensive, which should be reduced with semi-automatic systems.In this context, active learning, in which the Oracle intervenes to choose the best examples to label, is promising. The intuition is as follows: by choosing the smartly the examples and not randomly, the models should improve with less effort for the oracle and therefore at lower cost (i.e. with less annotated examples). In this PhD, we will evaluate different active learning approaches combined with recent deep learning models.In addition, when small annotated data set is available, one possibility of improvement is to artificially increase the data quantity during the training phase, by automatically creating new data from existing data. More precisely, we inject knowledge by taking into account the invariant properties of the data with respect to certain transformations. The augmented data can thus cover an unexplored input space, avoid overfitting and improve the generalization of the model. In this Ph.D, we will propose and evaluate a new approach for textual data augmentation.These two contributions will be evaluated on different textual datasets in the medical domain
Gräbe, Hans-Gert. "Technik und Gesellschaft. Rudolf Rochhausen zum Gedenken.: Texte und Erinnerungen zur Dahlener Tagung 2012". Hans-Gert Gräbe, 2012. https://ul.qucosa.de/id/qucosa%3A11382.
Texto completoUsunier, Nicolas. "Apprentissage de fonctions d'ordonnancement : une étude théorique de la réduction à la classification et deux applications à la recherche d'information". Paris 6, 2006. http://www.theses.fr/2006PA066425.
Texto completoBerio, Luciano. "Text of Texts". Bärenreiter Verlag, 1998. https://slub.qucosa.de/id/qucosa%3A36791.
Texto completoNurse, Derek. "Historical texts from the Swahili coast (part 2)". Swahili Forum; 2 (1995), S. 41-72, 1995. https://ul.qucosa.de/id/qucosa%3A11618.
Texto completoBitterlich, Thomas. "Pinguine schreiben nicht". Universitätsbibliothek Leipzig, 2014. http://www.kulturtechnik-schreiben.imz.uni-erlangen.de/veranstaltungen-texte/schreiben-im-theater.shtml.
Texto completoBitterlich, Thomas. "Können Dramen und Theateraufführungen als Schrift begriffen werden?" Universitätsbibliothek Leipzig, 2012. http://www.kulturtechnik-schreiben.imz.uni-erlangen.de/veranstaltungen-texte/schreiben-im-theater.shtml.
Texto completoLucarelli, Rita. "Images of eternity in 3D". Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-201685.
Texto completoEl, Jed Olfa. "WebSum : système de résumé automatique de réponses des moteurs de recherche". Toulouse 3, 2006. http://www.theses.fr/2006TOU30145.
Texto completoThis thesis lies within the general framework of the information retrieval and more precisely, within the framework of the web document classification and organization. Our objective is to develop a system of automatic summarizing of the search engine answers in the encyclopaedic style (WebSum). This type of summary aims at classifying the search engine answers according to the various topics or what we call in our work, facets of the user query. To carry out this objective, we propose : - A method of identification of the facets of a given query based on the generative lexicon; - An approach of classification of the search engine answers under this various facets; - And a method of evaluation of the relevance of the web pages
Beyer, Stefan, Biase-Dyson Camilla Di y Nina Wagenknecht. "Annotating figurative language". Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-201537.
Texto completoBerti, Monica. "The Digital Marmor Parium". Epigraphy Edit-a-thon : editing chronological and geographic data in ancient inscriptions ; April 20-22, 2016 / edited by Monica Berti. Leipzig, 2016. Beitrag 4, 2016. https://ul.qucosa.de/id/qucosa%3A14455.
Texto completoSchmidt, Annalena. "Von weißen Flecken der Erinnerungslandschaft und neuen Chancen für die Forschung. GeoBib: Eine annotierte und georeferenzierte Onlinebibliographie der Texte der frühen deutsch- und polnischsprachigen Holocaust- und Lagerliteratur (1933–1949)". HATiKVA e.V. – Die Hoffnung Bildungs- und Begegnungsstätte für Jüdische Geschichte und Kultur Sachsen, 2015. https://slub.qucosa.de/id/qucosa%3A34866.
Texto completoEl, Haj Abir. "Stochastics blockmodels, classifications and applications". Thesis, Poitiers, 2019. http://www.theses.fr/2019POIT2300.
Texto completoThis PhD thesis focuses on the analysis of weighted networks, where each edge is associated to a weight representing its strength. We introduce an extension of the binary stochastic block model (SBM), called binomial stochastic block model (bSBM). This question is motivated by the study of co-citation networks in a context of text mining where data is represented by a graph. Nodes are words and each edge joining two words is weighted by the number of documents included in the corpus simultaneously citing this pair of words. We develop an inference method based on a variational maximization algorithm (VEM) to estimate the parameters of the modelas well as to classify the words of the network. Then, we adopt a method based on maximizing an integrated classification likelihood (ICL) criterion to select the optimal model and the number of clusters. Otherwise, we develop a variational approach to analyze the given network. Then we compare the two approaches. Applications based on real data are adopted to show the effectiveness of the two methods as well as to compare them. Finally, we develop a SBM model with several attributes to deal with node-weighted networks. We motivate this approach by an application that aims at the development of a tool to help the specification of different cognitive treatments performed by the brain during the preparation of the writing
Maxey, Craig. "Free-text disease classification". Thesis, Monterey, California. Naval Postgraduate School, 2011. http://hdl.handle.net/10945/5554.
Texto completoModern medicine produces data with every patient interaction. While many data elements are easily captured and analyzed, the fundamental record of the patient/clinican interaction is captured in written, free-text. This thesis provides the foundation for the Military Health System to begin building an auto classifier for ICD9 diagnostic codes based on free-text clinican notes. Support Vector Machine models are fit to approximately 84,000 free-text records providing a means to predict ICD9 codes for other free-text records. While the research conducted in this thesis does not provide a consumate ICD9 classification model, it does provide the foundation required to further more detailed analysis.
Dzunic, Zoran Ph D. Massachusetts Institute of Technology. "Text structure-aware classification". Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53315.
Texto completoCataloged from PDF version of thesis.
Includes bibliographical references (p. 73-76).
Bag-of-words representations are used in many NLP applications, such as text classification and sentiment analysis. These representations ignore relations across different sentences in a text and disregard the underlying structure of documents. In this work, we present a method for text classification that takes into account document structure and only considers segments that contain information relevant for a classification task. In contrast to the previous work, which assumes that relevance annotation is given, we perform the relevance prediction in an unsupervised fashion. We develop a Conditional Bayesian Network model that incorporates relevance as a hidden variable of a target classifier. Relevance and label predictions are performed jointly, optimizing the relevance component for the best result of the target classifier. Our work demonstrates that incorporating structural information in document analysis yields significant performance gains over bag-of-words approaches on some NLP tasks.
by Zoran Dzunic.
S.M.
Baker, Simon. "Semantic text classification for cancer text mining". Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/275838.
Texto completoGhanmi, Nabil. "Segmentation d'images de documents manuscrits composites : application aux documents de chimie". Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0109/document.
Texto completoThis thesis deals with chemistry document segmentation and structure analysis. This work aims to help chemists by providing the information on the experiments which have already been carried out. The documents are handwritten, heterogeneous and multi-writers. Although their physical structure is relatively simple, since it consists of a succession of three regions representing: the chemical formula of the experiment, a table of the used products and one or more text blocks describing the experimental procedure, several difficulties are encountered. In fact, the lines located at the region boundaries and the imperfections of the table layout make the separation task a real challenge. The proposed methodology takes into account these difficulties by performing segmentation at several levels and treating the region separation as a classification problem. First, the document image is segmented into linear structures using an appropriate horizontal smoothing. The horizontal threshold combined with a vertical overlapping tolerance favor the consolidation of fragmented elements of the formula without too merge the text. These linear structures are classified in text or graphic based on discriminant structural features. Then, the segmentation is continued on text lines to separate the rows of the table from the lines of the raw text locks. We proposed for this classification, a CRF model for determining the optimal labelling of the line sequence. The choice of this kind of model has been motivated by its ability to absorb the variability of lines and to exploit contextual information. For the segmentation of table into cells, we proposed a hybrid method that includes two levels of analysis: structural and syntactic. The first relies on the presence of graphic lines and the alignment of both text and spaces. The second tends to exploit the coherence of the cell content syntax. We proposed, in this context, a Recognition-based approach using contextual knowledge to detect the numeric fields present in the table. The thesis was carried out in the framework of CIFRE, in collaboration with the eNovalys campany.We have implemented and tested all the steps of the proposed system on a consequent dataset of chemistry documents
Olin, Per. "Evaluation of text classification techniques for log file classification". Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166641.
Texto completoPrabowo, Rudy. "Ontology-based automatic text classification". Thesis, University of Wolverhampton, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418665.
Texto completoEriksson, Linus y Kevin Frejdh. "Swedish biomedical text-miningand classification". Thesis, KTH, Hälsoinformatik och logistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278067.
Texto completoSammanfattning Manuell klassificering av text är tidskonsumerande och kostsamt, däremot är det en nödvändighet inom exempelvis biomedicinska områden för att kunna kvantifierabehandlingen av data. I denna studie undersöktes två alternativa sätt att utan tillgång till stora mängder data, kunna framställa textklassificeringsmodeller som kan förstå och klassificerabiomedicinsk text. Studien undersökte ifall om en specialiserad modell borde anses som ettkrav för detta, eller ifall om en generisk modell kan räcka till. Båda modellerna som användesvar baserade på allmänt tillgängliga versioner, en som var tränad att förstå engelskbiomedicinsk text och en annan som var tränad att förstå vanlig svensk text. Den svenskamodellen introducerades till ett nytt område av text medan den engelska modellen arbetade påöversatta svenska texter. Resultatet visade att den svenska modellen kunde förstå och klassificera texten nästan dubbeltså effektivt som den engelska, däremot med en relativt låg grad av träffsäkerhet. Slutligenkunde slutsatsen dras att den använda metoden visade potential vid träning av modeller, ochvid brist på större datamängder borde generellt tränade modeller kunna nyttjas som bas för attsedan kunna specialiseras till andra områden. Nyckelord NLP, textbrytning, biomedicinska texter, klassificering, märkning, modeller, BERT,maskininlärning, FIC, ICF.
Whitelaw, Casey. "Systemic features for text classification". Thesis, The University of Sydney, 2005. https://hdl.handle.net/2123/28097.
Texto completoGheldof, Tom. "Trismegistos". Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-201617.
Texto completo