Dissertations / Theses on the topic 'Fouille de règle'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Fouille de règle.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Hamrouni, Tarek. "Fouille de représentations concises des motifs fréquents à travers les espaces de recherche conjonctif et disjonctif." Phd thesis, Université d'Artois, 2009. http://tel.archives-ouvertes.fr/tel-00465733.
Full textFrancisci, Dominique. "Techniques d'optimisation pour la fouille de données." Phd thesis, Université de Nice Sophia-Antipolis, 2004. http://tel.archives-ouvertes.fr/tel-00216131.
Full textSimonne, Lucas. "Mining differential causal rules in knowledge graphs." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG008.
Full textThe mining of association rules within knowledge graphs is an important area of research.Indeed, this type of rule makes it possible to represent knowledge, and their application makes it possible to complete a knowledge graph by adding missing triples or to remove erroneous triples.However, these rules express associations and do not allow the expression of causal relations, whose semantics differ from an association or a correlation.In a system, a causal link between variable A and variable B is a relationship oriented from A to B. It indicates that a change in A causes a change in B, with the other variables in the system maintaining the same values.Several frameworks exist for determining causal relationships, including the potential outcome framework, which involves matching similar instances with different values on a variable named treatment to study the effect of that treatment on another variable named the outcome.In this thesis, we propose several approaches to define rules representing a causal effect of a treatment on an outcome.This effect can be local, i.e., valid for a subset of instances of a knowledge graph defined by a graph pattern, or average, i.e., valid on average for the whole set of graph instances.The discovery of these rules is based on the framework of studying potential outcomes by matching similar instances and comparing their RDF descriptions or their learned vectorial representations through graph embedding models
Boudane, Abdelhamid. "Fouille de données par contraintes." Thesis, Artois, 2018. http://www.theses.fr/2018ARTO0403/document.
Full textIn this thesis, We adress the well-known clustering and association rules mining problems. Our first contribution introduces a new clustering framework, where complex objects are described by propositional formulas. First, we extend the two well-known k-means and hierarchical agglomerative clustering techniques to deal with these complex objects. Second, we introduce a new divisive algorithm for clustering objects represented explicitly by sets of models. Finally, we propose a propositional satisfiability based encoding of the problem of clustering propositional formulas without the need for an explicit representation of their models. In a second contribution, we propose a new propositional satisfiability based approach to mine association rules in a single step. The task is modeled as a propositional formula whose models correspond to the rules to be mined. To highlight the flexibility of our proposed framework, we also address other variants, namely the closed, minimal non-redundant, most general and indirect association rules mining tasks. Experiments on many datasets show that on the majority of the considered association rules mining tasks, our declarative approach achieves better performance than the state-of-the-art specialized techniques
Idoudi, Rihab. "Fouille de connaissances en diagnostic mammographique par ontologie et règles d'association." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0005/document.
Full textFacing the significant complexity of the mammography area and the massive changes in its data, the need to contextualize knowledge in a formal and comprehensive modeling is becoming increasingly urgent for experts. It is within this framework that our thesis work focuses on unifying different sources of knowledge related to the domain within a target ontological modeling. On the one hand, there is, nowadays, several mammographic ontological modeling, where each resource has a distinct perspective area of interest. On the other hand, the implementation of mammography acquisition systems makes available a large volume of information providing a decisive competitive knowledge. However, these fragments of knowledge are not interoperable and they require knowledge management methodologies for being comprehensive. In this context, we are interested on the enrichment of an existing domain ontology through the extraction and the management of new knowledge (concepts and relations) derived from two scientific currents: ontological resources and databases holding with past experiences. Our approach integrates two knowledge mining levels: The first module is the conceptual target mammographic ontology enrichment with new concepts extracting from source ontologies. This step includes three main stages: First, the stage of pre-alignment. The latter consists on building for each input ontology a hierarchy of fuzzy conceptual clusters. The goal is to reduce the alignment task from two full ontologies to two reduced conceptual clusters. The second stage consists on aligning the two hierarchical structures of both source and target ontologies. Thirdly, the validated alignments are used to enrich the reference ontology with new concepts in order to increase the granularity of the knowledge base. The second level of management is interested in the target mammographic ontology relational enrichment by novel relations deducted from domain database. The latter includes medical records of mammograms collected from radiology services. This section includes four main steps: i) the preprocessing of textual data ii) the application of techniques for data mining (or knowledge extraction) to extract new associations from past experience in the form of rules, iii) the post-processing of the generated rules. The latter is to filter and classify the rules in order to facilitate their interpretation and validation by expert, vi) The enrichment of the ontology by new associations between concepts. This approach has been implemented and validated on real mammographic ontologies and patient data provided by Taher Sfar and Ben Arous hospitals. The research work presented in this manuscript relates to knowledge using and merging from heterogeneous sources in order to improve the knowledge management process
Guillet, Fabrice. "Qualité, Fouille et Gestion des Connaissances." Habilitation à diriger des recherches, Université de Nantes, 2006. http://tel.archives-ouvertes.fr/tel-00481938.
Full textNguyen, Thi Kim Ngan. "Generalizing association rules in n-ary relations : application to dynamic graph analysis." Phd thesis, INSA de Lyon, 2012. http://tel.archives-ouvertes.fr/tel-00995132.
Full textPapon, Pierre-Antoine. "Extraction optimisée de règles d'association positives et négatives intéressantes." Thesis, Clermont-Ferrand 2, 2016. http://www.theses.fr/2016CLF22702/document.
Full textThe purpose of data mining is to extract knowledge from large amount of data. The extracted knowledge can take different forms. In this work, we will seek to extract knowledge only in the form of positive association rules and negative association rules. A negative association rule is a rule in which the presence and the absence of a variable can be used. When considering the absence of variables in the study, we will expand the semantics of knowledge and extract undetectable information by the positive association rules mining methods. This will, for example allow doctors to find characteristics that prevent disease instead of searching characteristics that cause a disease. Nevertheless, adding the negation will cause various challenges. Indeed, as the absence of a variable is usually more important than the presence of these same variables, the computational costs will increase exponentially and the risk to extract a prohibitive number of rules, which are mostly redundant and uninteresting, will also increase. In order to address these problems, our proposal, based on the famous Apriori algorithm, does not rely on frequent itemsets as other methods do. We define a new type of itemsets : the reasonably frequent itemsets which will improve the quality of the rules. We also rely on the M G measure to know which forms of rules should be mined but also to remove uninteresting rules. We also use meta-rules to allow us to infer the interest of a negative rule from a positive one. Moreover, our algorithm will extract a new type of negative rules that seems interesting : the rules for which the antecedent and the consequent are conjunctions of negative itemsets. Our study ends with a quantitative and qualitative comparison with other positive and negative association rules mining algorithms on various databases of the literature. Our software ARA (Association Rules Analyzer ) facilitates the qualitative analysis of the algorithms by allowing to compare intuitively the algorithms and to apply in post-process treatments various quality measures. Finally, our proposal improves the extraction in the number and the quality of the extracted rules but also in the rules search path
David, Jérôme. "AROMA : une méthode pour la découverte d'alignements orientés entre ontologies à partir de règles d'association." Phd thesis, Université de Nantes, 2007. http://tel.archives-ouvertes.fr/tel-00200040.
Full textDans la littérature, la plupart des travaux traitant des méthodes d'alignement d'ontologies ou de schémas s'appuient sur une définition intentionnelle des schémas et utilisent des relations basées sur des mesures de similarité qui ont la particularité d'être symétriques (équivalences). Afin d'améliorer les méthodes d'alignement, et en nous inspirant des travaux sur la découverte de règles d'association, des mesures de qualité associées, et sur l'analyse statistique implicative, nous proposons de découvrir des appariements asymétriques (implications) entre ontologies. Ainsi, la contribution principale de cette thèse concerne la conception d'une méthode d'alignement extensionnelle et orientée basée sur la découverte des implications significatives entre deux hiérarchies plantées dans un corpus textuel.
Notre méthode d'alignement se décompose en trois phases successives. La phase de prétraitement permet de préparer les ontologies à l'alignement en les redéfinissant sur un ensemble commun de termes extraits des textes et sélectionnés statistiquement. La phase de fouille extrait un alignement implicatif entre hiérarchies. La dernière phase de post-traitement des résultats permet de produire des alignements consistants et minimaux (selon un critère de redondance).
Les principaux apports de cette thèse sont : (1) Une modélisation de l'alignement étendue pour la prise en compte de l'implication. Nous définissons les notions de fermeture et couverture d'un alignement permettant de formaliser la redondance et la consistance d'un alignement. Nous étudions également la symétricité et les cardinalités d'un alignement. (2) La réalisation de la méthode AROMA et d'une interface d'aide à la validation d'alignements. (3) Une extension d'un modèle d'évaluation sémantique pour la prise en compte de la présence d'implications dans un alignement. (4) L'étude du comportement et de la performance d'AROMA sur différents types de jeux de tests (annuaires Web, catalogues et ontologies au format OWL) avec une sélection de six mesures de qualité.
Les résultats obtenus sont prometteurs car ils montrent la complémentarité de notre méthode avec les approches existantes.
Martin, Florent. "Pronostic de défaillances de pompes à vide - Exploitation automatique de règles extraites par fouille de données." Thesis, Grenoble, 2011. http://www.theses.fr/2011GRENA011.
Full textThis thesis presents a symbolic rule-based method that addresses system prognosis. It also details a successful application to complex vacuum pumping systems. More precisely, using historical vibratory data, we first model the behavior of the pumps by extracting a given type of episode rules, namely the First Local Maximum episode rules (FLM-rules). The algorithm that extracts FLM-rules also determines automatically their respective optimal temporal window, i.e. the temporal window in which the probability of observing the premiss and the conclusion of a rule is maximum. A subset of the extracted FLM-rules is then selected in order to further predict pumping system failures in a vibratory data stream context. Our contribution consists in selecting the most reliable FLM-rules, continuously matching them in a data stream of vibratory data and building a forecast time interval using the optimal temporal windows of the FLM-rules that have been matched
Mondal, Kartick Chandra. "Algorithmes pour la fouille de données et la bio-informatique." Thesis, Nice, 2013. http://www.theses.fr/2013NICE4049.
Full textKnowledge pattern extraction is one of the major topics in the data mining and background knowledge integration domains. Out of several data mining techniques, association rule mining and bi-clustering are two major complementary tasks for these topics. These tasks gained much importance in many domains in recent years. However, no approach was proposed to perform them in one process. This poses the problems of resources required (memory, execution times and data accesses) to perform independent extractions and of the unification of the different results. We propose an original approach for extracting different categories of knowledge patterns while using minimum resources. This approach is based on the frequent closed patterns theoretical framework and uses a novel suffix-tree based data structure to extract conceptual minimal representations of association rules, bi-clusters and classification rules. These patterns extend the classical frameworks of association and classification rules, and bi-clusters as data objects supporting each pattern and hierarchical relationships between patterns are also extracted. This approach was applied to the analysis of HIV-1 and human protein-protein interaction data. Analyzing such inter-species protein interactions is a recent major challenge in computational biology. Databases integrating heterogeneous interaction information and biological background knowledge on proteins have been constructed. Experimental results show that the proposed approach can efficiently process these databases and that extracted conceptual patterns can help the understanding and analysis of the nature of relationships between interacting proteins
Fu, Huaiguo. "Algorithmique des treillis de concepts : application à la fouille de données." Artois, 2005. http://www.theses.fr/2005ARTO0401.
Full textOur main concern in this thesis is concept (or galois) lattices and its application to data mining. We achieve a comparison of different concept lattices algorithms on benchmarks taken from UCI. During this comparison, we analyse the duality phenomenon between objects and attributes on each algorithm performance. This analysis allows to show that the running time of an algorithm may considerably vary when using the formal context or the transposed context. Using the Divide-and-Conquer paradigm, we design a new concept lattice algorithm, ScalingNextClosure, which decomposes the search space in many partitions and builds formal concepts for each partition independently. By reducing the search space, ScalingNextClosure can deal efficiently with few memory space and thus treat huge formal context, but only if the whole context can be loaded in the memory. An experimental comparison between NextClosure and ScalingNextClosure shows the efficiency of such decomposition approach. In any huge dataset, ScalingNextClosure runs faster than NextClosure on a sequential machine, with an average win factor equal to 10. Another advantage of ScalingNextClosure is that it can be easily implemented on a distributed or parallel architecture. Mining frequent closed itemsets (FCI) is a subproblem of mining association rules. We adapt ScalingNextClosure to mine frequent closed itemsets, and design a new algorithm, called PFC. PFC uses the support measure to prune the search space within one partition. An experimental comparison conducted on a sequential architecture, between PFC with one of the efficient FCI system, is discussed
Bothorel, Gwenael. "Algorithmes automatiques pour la fouille visuelle de données et la visualisation de règles d’association : application aux données aéronautiques." Phd thesis, Toulouse, INPT, 2014. http://oatao.univ-toulouse.fr/13783/1/bothorel.pdf.
Full textLehn, Rémi. "Un système interactif de visualisation et de fouille de règles pour l'extraction de connaissances dans les bases de données." Nantes, 2000. http://www.theses.fr/2000NANT2110.
Full textKirchgessner, Martin. "Fouille et classement d'ensembles fermés dans des données transactionnelles de grande échelle." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM060/document.
Full textThe recent increase of data volumes raises new challenges for itemset mining algorithms. In this thesis, we focus on transactional datasets (collections of items sets, for example supermarket tickets) containing at least a million transactions over hundreds of thousands items. These datasets usually follow a "long tail" distribution: a few items are very frequent, and most items appear rarely. Such distributions are often truncated by existing itemset mining algorithms, whose results concern only a very small portion of the available items (the most frequents, usually). Thus, existing methods fail to concisely provide relevant insights on large datasets. We therefore introduce a new semantics which is more intuitive for the analyst: browsing associations per item, for any item, and less than a hundred associations at once.To address the items' coverage challenge, our first contribution is the item-centric mining problem. It consists in computing, for each item in the dataset, the k most frequent closed itemsets containing this item. We present an algorithm to solve it, TopPI. We show that TopPI computes efficiently interesting results over our datasets, outperforming simpler solutions or emulations based on existing algorithms, both in terms of run-time and result completeness. We also show and empirically validate how TopPI can be parallelized, on multi-core machines and on Hadoop clusters, in order to speed-up computation on large scale datasets.Our second contribution is CAPA, a framework allowing us to study which existing measures of association rules' quality are relevant to rank results. This concerns results obtained from TopPI or from jLCM, our implementation of a state-of-the-art frequent closed itemsets mining algorithm (LCM). Our quantitative study shows that the 39 quality measures we compare can be grouped into 5 families, based on the similarity of the rankings they produce. We also involve marketing experts in a qualitative study, in order to discover which of the 5 families we propose highlights the most interesting associations for their domain.Our close collaboration with Intermarché, one of our industrial partners in the Datalyse project, allows us to show extensive experiments on real, nation-wide supermarket data. We present a complete analytics workflow addressing this use case. We also experiment on Web data. Our contributions can be relevant in various other fields, thanks to the genericity of transactional datasets.Altogether our contributions allow analysts to discover associations of interest in modern datasets. We pave the way for a more reactive discovery of items' associations in large-scale datasets, whether on highly dynamic data or for interactive exploration systems
Couturier, Olivier. "Contribution à la fouille de données : règles d'association et interactivité au sein d'un processus d'extraction de connaissances dans les données." Artois, 2005. http://www.theses.fr/2005ARTO0410.
Full textSzathmary, Laszlo. "Méthodes symboliques de fouille de données avec la plate-forme Coron." Phd thesis, Université Henri Poincaré - Nancy I, 2006. http://tel.archives-ouvertes.fr/tel-00336374.
Full textLes contributions principales de cette thèse sont : (1) nous avons développé et adapté des algorithmes pour trouver les règles d'association minimales non-redondantes ; (2) nous avons défini une nouvelle base pour les règles d'associations appelée “règles fermées” ; (3) nous avons étudié un champ de l'ECBD important mais relativement peu étudié, à savoir l'extraction des motifs rares et des règles d'association rares ; (4) nous avons regroupé nos algorithmes et une collection d'autres algorithmes ainsi que d'autres opérations auxiliaires d'ECBD dans une boîte à outils logicielle appelée Coron.
Ramstein, Gérard. "Application de techniques de fouille de données en Bio-informatique." Habilitation à diriger des recherches, Université de Nantes, 2012. http://tel.archives-ouvertes.fr/tel-00706566.
Full textAzé, Jérôme. "Extraction de Connaissances à partir de Données Numériques et Textuelles." Phd thesis, Université Paris Sud - Paris XI, 2003. http://tel.archives-ouvertes.fr/tel-00011196.
Full textL'analyse de telles données est souvent contrainte par la définition d'un support minimal utilisé pour filtrer les connaissances non intéressantes.
Les experts des données ont souvent des difficultés pour déterminer ce support.
Nous avons proposé une méthode permettant de ne pas fixer un support minimal et fondée sur l'utilisation de mesures de qualité.
Nous nous sommes focalisés sur l'extraction de connaissances de la forme "règles d'association".
Ces règles doivent vérifier un ou plusieurs critères de qualité pour être considérées comme intéressantes et proposées à l'expert.
Nous avons proposé deux mesures de qualité combinant différents critères et permettant d'extraire des règles intéressantes.
Nous avons ainsi pu proposer un algorithme permettant d'extraire ces règles sans utiliser la contrainte du support minimal.
Le comportement de notre algorithme a été étudié en présence de données bruitées et nous avons pu mettre en évidence la difficulté d'extraire automatiquement des connaissances fiables à partir de données bruitées.
Une des solutions que nous avons proposée consiste à évaluer la résistance au bruit de chaque règle et d'en informer l'expert lors de l'analyse et de la validation des connaissances obtenues.
Enfin, une étude sur des données réelles a été effectuée dans le cadre d'un processus de fouille de textes.
Les connaissances recherchées dans ces textes sont des règles d'association entre des concepts définis par l'expert et propres au domaine étudié.
Nous avons proposé un outil permettant d'extraire les connaissances et d'assister l'expert lors de la validation de celles-ci.
Les différents résultats obtenus montrent qu'il est possible d'obtenir des connaissances intéressantes à partir de données textuelles en minimisant la sollicitation de l'expert dans la phase d'extraction des règles d'association.
Li, Dong Haoyuan. "Extraction de séquences inattendues : des motifs séquentiels aux règles d’implication." Montpellier 2, 2009. http://www.theses.fr/2009MON20253.
Full textThe sequential patterns can be viewed as an extension of the notion of association rules with integrating temporal constraints, which are effective for representing statistical frequency based behaviors between the elements contained in sequence data, that is, the discovered patterns are interesting because they are frequent. However, with considering prior domain knowledge of the data, another reason why the discovered patterns are interesting is because they are unexpected. In this thesis, we investigate the problems in the discovery of unexpected sequences in large databases with respect to prior domain expertise knowledge. We first methodically develop the framework Muse with integrating the approaches to discover the three forms of unexpected sequences. We then extend the framework Muse by adopting fuzzy set theory for describing sequence occurrence. We also propose a generalized framework SoftMuse with respect to the concept hierarchies on the taxonomy of data. We further propose the notions of unexpected sequential patterns and unexpected implication rules, in order to evaluate the discovered unexpected sequences by using a self-validation process. We finally propose the discovery and validation of unexpected sentences in free format text documents. The usefulness and effectiveness of our proposed approaches are shown with the experiments on synthetic data, real Web server access log data, and text document classification
Ben, Said Guefrech Zohra. "A virtual reality-based approach for interactive and visual mining of association rules." Nantes, 2012. http://archive.bu.univ-nantes.fr/pollux/show.action?id=359deab9-229a-4369-908d-bfbbe98adaea.
Full textThis thesis is at the intersection of two active research areas : Association Rules Mining and Virtual Reality. The main limitations of the association rule extraction algorithms are (i) the large amount of the generated rules and (ii) their low quality. Several solutions have been proposed to address this problem such as, the post-processing of association rules that allows rule validation and extraction of useful knowledge. Whereas rules are automatically extracted by combinatorial algorithms, rule post-processing is done by the user. Visualisation can help the user facing the large amount of rules by representing them in visual form. In order to find relevant knowledge in visual representations, the user needs to interact with these representations. To this aim, it is essential to provide the user with efficient interaction techniques. This work addresses two main issues : an association rule representation that allows the user quickly detection of the most interesting rules and interactive exploration of rules. The first issue requires an intuitive representation metaphor of association rules. The second requires an interactive exploration process allowing the user to explore the rule search space focusing on interesting rules. The main contributions of this work can be summarised as follows : – We propose a new classification for Visual Data Mining techniques, based on both 3D representations and interaction techniques. Such a classification helps the user choosing a visual representation and an interaction technique for his/her application. – We propose a new visualisation metaphor for association rules that takes into account the attributes of the rule, the contribution of each one, and their correlations. – We propose a methodology for interactive exploration of association rules to facilitate the user task facing large sets of rules taking into account his/her cognitive capabilities. In this methodology, local algorithms are used to recommend better rules based on a reference rule which is proposed by the user. Then, the user can both drives extraction and post-processing of rules using appropriate interaction operators. – We developed a tool that implements all the methodology functionality. The tool is based on an intuitive display in a virtual environment and supports multiple interaction methods
Cleuziou, Guillaume. "Une méthode de classification non-supervisée pour l'apprentissage de règles et la recherche d'information." Phd thesis, Université d'Orléans, 2004. http://tel.archives-ouvertes.fr/tel-00084828.
Full textNous proposons, dans cette étude, l'algorithme de clustering PoBOC permettant de structurer un ensemble d'objets en classes non-disjointes. Nous utilisons cette méthode de clustering comme outil de traitement dans deux applications très différentes.
- En apprentissage supervisé, l'organisation préalable des instances apporte une connaissance utile pour la tâche d'induction de règles propositionnelles et logiques.
- En Recherche d'Information, les ambiguïtés et subtilités de la langue naturelle induisent naturellement des recouvrements entre thématiques.
Dans ces deux domaines de recherche, l'intérêt d'organiser les objets en classes non-disjointes est confirmé par les études expérimentales adaptées.
Bouker, Slim. "Contribution à l'extraction des règles d'association basée sur des préférences." Thesis, Clermont-Ferrand 2, 2015. http://www.theses.fr/2015CLF22585/document.
Full textKouomou-Choupo, Anicet. "Améliorer la recherche par similarité dans une grande base d'images fixes par des techniques de fouilles de données." Phd thesis, Université Rennes 1, 2006. http://tel.archives-ouvertes.fr/tel-00524418.
Full textShahzad, Atif. "Une Approche Hybride de Simulation-Optimisation Basée sur la fouille de Données pour les problèmes d'ordonnancement." Phd thesis, Université de Nantes, 2011. http://tel.archives-ouvertes.fr/tel-00647353.
Full textCadot, Martine. "Extraire et valider les relations complexes en sciences humaines : statistiques, motifs et règles d'association." Phd thesis, Université de Franche-Comté, 2006. http://tel.archives-ouvertes.fr/tel-00594174.
Full textLi, Haoyuan. "Extraction de séquences inattendues : des motifs séquentiels aux règles d'implication." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2009. http://tel.archives-ouvertes.fr/tel-00431117.
Full textDiallo, Mouhamadou Saliou. "Découverte de règles de préférences contextuelles : application à la construction de profils utilisateurs." Thesis, Tours, 2015. http://www.theses.fr/2015TOUR4052/document.
Full textThe use of preferences arouses a growing interest to personalize response to requests and making targeted recommandations. Nevertheless, manual construction of preferences profiles remains complex and time-consuming. In this context, we present in this thesis a new automatic method for preferences elicitation based on data mining techniques. Our proposal is a two phase algorithm : (1) Extracting all contextual preferences rules from a set of user preferences and (2) Building user profile. At the end of the first phase, we notice that there is to much preference rules which satisfy the fixed constraints then in the second phase we eliminate the superfluous preferences rules. In our approach a user profile is constituted by the set of contextual preferences rules resulting of the second phase. A user profile must satisfy conciseness and soundness properties. The soundness property guarantees that the preference rules specifying the profiles are in agreement with a large set of the user preferences, and contradict a small number of them. On the other hand, conciseness implies that profiles are small sets of preference rules. We also proposed four predictions methods which use the extracted profiles. We validated our approach on a set of real-world movie rating datasets built from MovieLens and IMDB. The whole movie rating database consists of 800,156 votes from 6,040 users about 3,881 movies. The results of these experiments demonstrates that the conciseness of user profiles is controlled by the minimal agreement threshold and that even with strong reduction, the soundness of the profile remains at an acceptable level. These experiment also show that predictive qualities of some of our ranking strategies outperform SVMRank in several situations
Ben, Said Zohra. "A virtual reality-based approach for interactive and visual mining of association rules." Phd thesis, Université de Nantes, 2012. http://tel.archives-ouvertes.fr/tel-00829419.
Full textMecharnia, Thamer. "Approches sémantiques pour la prédiction de présence d'amiante dans les bâtiments : une approche probabiliste et une approche à base de règles." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG036.
Full textNowadays, Knowledge Graphs are used to represent all kinds of data and they constitute scalable and interoperable resources that can be used by decision support tools. The Scientific and Technical Center for Building (CSTB) was asked to develop a tool to help identify materials containing asbestos in buildings. In this context, we have created and populated the ASBESTOS ontology which allows the representation of building data and the results of diagnostics carried out in order to detect the presence of asbestos in the used products. We then relied on this knowledge graph to develop two approaches which make it possible to predict the presence of asbestos in products in the absence of the reference of the marketed product actually used.The first approach, called the hybrid approach, is based on external resources describing the periods when the marketed products are asbestos-containing to calculate the probability of the existence of asbestos in a building component. This approach addresses conflicts between external resources, and incompleteness of listed data by applying a pessimistic fusion approach that adjusts the calculated probabilities using a subset of diagnostics.The second approach, called CRA-Miner, is inspired by inductive logic programming (ILP) methods to discover rules from the knowledge graph describing buildings and asbestos diagnoses. Since the reference of specific products used during construction is never specified, CRA-Miner considers temporal data, ASBESTOS ontology semantics, product types and contextual information such as part-of relations to discover a set of rules that can be used to predict the presence of asbestos in construction elements.The evaluation of the two approaches carried out on the ASBESTOS ontology populated with the data provided by the CSTB show that the results obtained, in particular when the two approaches are combined, are quite promising
Blanchard, Julien. "Un système de visualisation pour l'extraction, l'évaluation, et l'exploration interactives des règles d'association." Phd thesis, Université de Nantes, 2005. http://tel.archives-ouvertes.fr/tel-00421413.
Full textreprésentation de la connaissance en sciences cognitives. En fouille de données, la principale technique à base de règles est l'extraction de règles d'association, qui a donné lieu à de nombreux travaux de recherche.
La limite majeure des algorithmes d'extraction de règles d'association est qu'ils produisent communément de grandes quantités de règles, dont beaucoup se révèlent même sans aucun intérêt pour l'utilisateur. Ceci s'explique par la nature non supervisée de ces algorithmes : ne considérant aucune variable endogène, ils envisagent dans les règles toutes les combinaisons possibles de variables. Dans la pratique, l'utilisateur ne peut pas exploiter les résultats tels quels directement à la sortie des algorithmes. Un post-traitement consistant en une seconde opération de fouille se
révèle indispensable pour valider les volumes de règles et découvrir des connaissances utiles. Cependant, alors que la fouille de données est effectuée automatiquement par des algorithmes combinatoires, la fouille de règles est une
tâche laborieuse à la charge de l'utilisateur.
La thèse développe deux approches pour assister l'utilisateur dans le post-traitement des règles d'association :
– la mesure de la qualité des règles par des indices numériques,
– la supervision du post-traitement par une visualisation interactive.
Pour ce qui concerne la première approche, nous formalisons la notion d'indice de qualité de règles et réalisons une classification inédite des nombreux indices de la littérature, permettant d'aider l'utilisateur à choisir les indices pertinents pour son besoin. Nous présentons également trois nouveaux indices aux propriétés originales : l'indice
probabiliste d'écart à l'équilibre, l'intensité d'implication entropique, et le taux informationnel. Pour ce qui concerne la seconde approche, nous proposons une méthodologie de visualisation pour l'exploration interactive des règles. Elle
est conçue pour faciliter la tâche de l'utilisateur confronté à de grands ensembles de règles en prenant en compte ses capacités de traitement de l'information. Dans cette méthodologie, l'utilisateur dirige la découverte de connaissances
par des opérateurs de navigation adaptés en visualisant des ensembles successifs de règles décrits par des indices de qualité.
Les deux approches sont intégrées au sein de l'outil de visualisation ARVis (Association Rule Visualization) pour l'exploration interactive des règles d'association. ARVis implémente notre méthodologie au moyen d'une représentation
3D, inédite en visualisation de règles, mettant en valeur les indices de qualité. De plus, ARVis repose sur un algorithme spécifique d'extraction sous contraintes permettant de générer les règles interactivement au fur et à mesure de la navigation de l'utilisateur. Ainsi, en explorant les règles, l'utilisateur dirige à la fois l'extraction et le
post-traitement des connaissances.
Yahyaoui, Hasna. "Méthode d'analyse de données pour le diagnostic a posteriori de défauts de production - Application au secteur de la microélectronique." Thesis, Saint-Etienne, EMSE, 2015. http://www.theses.fr/2015EMSE0795/document.
Full textControlling the performance of a manufacturing site and the rapid identification of quality loss causes remain a daily challenge for manufacturers, who face continuing competition. In this context, this thesis aims to provide an analytical approach for the rapid identification of defect origins, by exploring data available thanks to different quality control systems, such FDC, metrology, parametric tests PT and the Electrical Wafer Sorting EWS. The proposed method, named CLARIF, combines three complementary data mining techniques namely clustering, association rules and decision trees induction. This method is based on unsupervised generation of a set of potentially problematic production modes, which are characterized by specific manufacturing conditions. Thus, we provide an analysis which descends to the level of equipment operating parameters. The originality of this method consists on (1) a pre-treatment step to identify spatial patterns from quality control data, (2) an unsupervised generation of manufacturing modes candidates to explain the quality loss case. We optimize the generation of association rules through the proposed ARCI algorithm, which is an adaptation of the famous association rules mining algorithm, APRIORI to integrate the constraints specific to our issue and filtering quality indicators, namely confidence, contribution and complexity, in order to identify the most interesting rules. Finally, we defined a Knowledge Discovery from Databases process, enabling to guide the user in applying CLARIF to explain both local and global quality loss problems
Kane, Mouhamadou bamba. "Extraction et sélection de motifs émergents minimaux : application à la chémoinformatique." Thesis, Normandie, 2017. http://www.theses.fr/2017NORMC223/document.
Full textPattern discovery is an important field of Knowledge Discovery in Databases.This work deals with the extraction of minimal emerging patterns. We propose a new efficientmethod which allows to extract the minimal emerging patterns with or without constraint ofsupport ; unlike existing methods that typically extract the most supported minimal emergentpatterns, at the risk of missing interesting but less supported patterns. Moreover, our methodtakes into account the absence of attribute that brings a new interesting knowledge.Considering the rules associated with emerging patterns highly supported as prototype rules,we have experimentally shown that this set of rules has good confidence on the covered objectsbut unfortunately does not cover a significant part of the objects ; which is a disavadntagefor their use in classification. We propose a prototype-based selection method that improvesthe coverage of the set of the prototype rules without a significative loss on their confidence.We apply our prototype-based selection method to a chemical data relating to the aquaticenvironment : Aquatox. In a classification context, it allows chemists to better explain theclassification of molecules, which, without this method of selection, would be predicted by theuse of a default rule
Raïssi, Chedy. "Extraction de Séquences Fréquentes : Des Bases de Données Statiques aux Flots de Données." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2008. http://tel.archives-ouvertes.fr/tel-00351626.
Full textLisa, Di Jorio. "Recherche de motifs graduels et application aux données médicales." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2010. http://tel.archives-ouvertes.fr/tel-00577212.
Full textSammouri, Wissam. "Data mining of temporal sequences for the prediction of infrequent failure events : application on floating train data for predictive maintenance." Thesis, Paris Est, 2014. http://www.theses.fr/2014PEST1041/document.
Full textIn order to meet the mounting social and economic demands, railway operators and manufacturers are striving for a longer availability and a better reliability of railway transportation systems. Commercial trains are being equipped with state-of-the-art onboard intelligent sensors monitoring various subsystems all over the train. These sensors provide real-time flow of data, called floating train data, consisting of georeferenced events, along with their spatial and temporal coordinates. Once ordered with respect to time, these events can be considered as long temporal sequences which can be mined for possible relationships. This has created a neccessity for sequential data mining techniques in order to derive meaningful associations rules or classification models from these data. Once discovered, these rules and models can then be used to perform an on-line analysis of the incoming event stream in order to predict the occurrence of target events, i.e, severe failures that require immediate corrective maintenance actions. The work in this thesis tackles the above mentioned data mining task. We aim to investigate and develop various methodologies to discover association rules and classification models which can help predict rare tilt and traction failures in sequences using past events that are less critical. The investigated techniques constitute two major axes: Association analysis, which is temporal and Classification techniques, which is not temporal. The main challenges confronting the data mining task and increasing its complexity are mainly the rarity of the target events to be predicted in addition to the heavy redundancy of some events and the frequent occurrence of data bursts. The results obtained on real datasets collected from a fleet of trains allows to highlight the effectiveness of the approaches and methodologies used
Fahed, Lina. "Prédire et influencer l'apparition des événements dans une séquence complexe." Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0125/document.
Full textFor several years now, a new phenomenon related to digital data is emerging : data which are increasingly voluminous, varied and rapid, appears and becomes available, they are often referred to as complex data. In this dissertation, we focus on a particular type of data : complex sequence of events, by asking the following question : “how to predict as soon as possible and to influence the appearance of future events within a complex sequence of events?”. First of all, we focus on the problem of predicting events as soon as possible in a sequence of events. We propose DEER : an algorithm for mining episode rules, which has the originality of controlling the horizon of the appearance of future events by imposing a temporal distance within the extracted rules. In a second phase, we address the problem of emergence detection in an events stream. We propose EER : an algorithm for detecting new emergent rules as soon as possible. In order to increase the reliability of new rules, EER relies on the similarity between theses rules and previously extracted rules. At last, we study the impact carried by events on other events within a sequence of events. We propose IE : an algorithm that introduces the concept of “influencer events” and studies the influence on the support, on the confidence and on the distance through three proposed measures. Our work is evaluated and validated through an experimental study carried on a real data set of blogs messages
Khajeh, Nassiri Armita. "Expressive Rule Discovery for Knowledge Graph Refinement." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG045.
Full textKnowledge graphs (KGs) are heterogeneous graph structures representing facts in a machine-readable format. They find applications in tasks such as question answering, disambiguation, and entity linking. However, KGs are inherently incomplete, and refining them is crucial to improve their effectiveness in downstream tasks. It's possible to complete the KGs by predicting missing links within a knowledge graph or integrating external sources and KGs. By extracting rules from the KG, we can leverage them to complete the graph while providing explainability. Various approaches have been proposed to mine rules efficiently. Yet, the literature lacks effective methods for effectively incorporating numerical predicates in rules. To address this gap, we propose REGNUM, which mines numerical rules with interval constraints. REGNUM builds upon the rules generated by an existing rule mining system and enriches them by incorporating numerical predicates guided by quality measures. Additionally, the interconnected nature of web data offers significant potential for completing and refining KGs, for instance, by data linking, which is the task of finding sameAs links between entities of different KGs. We introduce RE-miner, an approach that mines referring expressions (REs) for a class in a knowledge graph and uses them for data linking. REs are rules that are only applied to one entity. They support knowledge discovery and serve as an explainable way to link data. We employ pruning strategies to explore the search space efficiently, and we define characteristics to generate REs that are more relevant for data linking. Furthermore, we aim to explore the advantages and opportunities of fine-tuning language models to bridge the gap between KGs and textual data. We propose GilBERT, which leverages fine-tuning techniques on language models like BERT using a triplet loss. GilBERT demonstrates promising results for refinement tasks of relation prediction and triple classification tasks. By considering these challenges and proposing novel approaches, this thesis contributes to KG refinement, particularly emphasizing explainability and knowledge discovery. The outcomes of this research open doors to more research questions and pave the way for advancing towards more accurate and comprehensive KGs
Bosc, Guillaume. "Anytime discovery of a diverse set of patterns with Monte Carlo tree search." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEI074/document.
Full textThe discovery of patterns that strongly distinguish one class label from another is still a challenging data-mining task. Subgroup Discovery (SD) is a formal pattern mining framework that enables the construction of intelligible classifiers, and, most importantly, to elicit interesting hypotheses from the data. However, SD still faces two major issues: (i) how to define appropriate quality measures to characterize the interestingness of a pattern; (ii) how to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is unfeasible. The first issue has been tackled by Exceptional Model Mining (EMM) for discovering patterns that cover tuples that locally induce a model substantially different from the model of the whole dataset. The second issue has been studied in SD and EMM mainly with the use of beam-search strategies and genetic algorithms for discovering a pattern set that is non-redundant, diverse and of high quality. In this thesis, we argue that the greedy nature of most such previous approaches produces pattern sets that lack diversity. Consequently, we formally define pattern mining as a game and solve it with Monte Carlo Tree Search (MCTS), a recent technique mainly used for games and planning problems in artificial intelligence. Contrary to traditional sampling methods, MCTS leads to an any-time pattern mining approach without assumptions on either the quality measure or the data. It converges to an exhaustive search if given enough time and memory. The exploration/exploitation trade-off allows the diversity of the result set to be improved considerably compared to existing heuristics. We show that MCTS quickly finds a diverse pattern set of high quality in our application in neurosciences. We also propose and validate a new quality measure especially tuned for imbalanced multi-label data
Nouvel, Damien. "Reconnaissance des entités nommées par exploration de règles d'annotation - Interpréter les marqueurs d'annotation comme instructions de structuration locale." Phd thesis, Université François Rabelais - Tours, 2012. http://tel.archives-ouvertes.fr/tel-00788630.
Full textRaissi, Chedy. "Extraction de séquences fréquentes : des bases de données statiques aux flots de données." Montpellier 2, 2008. http://www.theses.fr/2008MON20063.
Full textRosá, Aiala. "Identification de opiniónes de differentes fuentes en textos en español." Thesis, Paris 10, 2011. http://www.theses.fr/2011PA100127.
Full textThis work presents a study of linguistic expressions of opinion from different sources in Spanish texts. The work includes the definition of a model for opinion predicates and their arguments (source, topic and message), the creation of a lexicon of opinion predicates which have information from the model associated, and the implementation of three systems.The first system, based on contextual rules, gets good results for the F-measure score (partial match): predicate, 92%; source, 81%; topic, 75%; message, 89%; full opinion, 85%. In addition, for source identification the F-measure for exact match is 79%. The second system, based on Conditional Random Fields (CRF), was developed only for the identification of sources, giving 76% of F-measure (exact match). The third system, which combines the two techniques (rules and CRF), gives a value of 83% of F-measure (exact match), showing that the combination yields interesting results.As regards the identification of sources, our system compared to other work developed for languages other than Spanish, gives very satisfactory results. Indeed these works had scores that fall between 63% and 89.5%.Moreover, in addition to the systems made for the identification of opinions, our work has led to the construction of several resources for Spanish: a lexicon of opinion predicates, a 13,000 words corpus with opinions annotated and a 40,000 words corpus with opinion predicates end sources annotated
Reynaud, Justine. "Découverte de définitions dans le web des données." Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0160.
Full textIn this thesis, we are interested in the web of data and knowledge units that can be possibly discovered inside. The web of data can be considered as a very large graph consisting of connected RDF triple databases. An RDF triple, denoted as (subject, predicate, object), represents a relation (i.e. the predicate) existing between two resources (i.e. the subject and the object). Resources can belong to one or more classes, where a class aggregates resources sharing common characteristics. Thus, these RDF triple databases can be seen as interconnected knowledge bases. Most of the time, these knowledge bases are collaboratively built thanks to human users. This is particularly the case of DBpedia, a central knowledge base within the web of data, which encodes Wikipedia content in RDF format. DBpedia is built from two types of Wikipedia data: on the one hand, (semi-)structured data such as infoboxes, and, on the other hand, categories, which are thematic clusters of manually generated pages. However, the semantics of categories in DBpedia, that is, the reason a human agent has bundled resources, is rarely made explicit. In fact, considering a class, a software agent has access to the resources that are regrouped together, i.e. the class extension, but it generally does not have access to the ``reasons'' underlying such a cluster, i.e. it does not have the class intension. Considering a category as a class of resources, we aim at discovering an intensional description of the category. More precisely, given a class extension, we are searching for the related intension. The pair (extension, intension) which is produced provides the final definition and the implementation of classification-based reasoning for software agents. This can be expressed in terms of necessary and sufficient conditions: if x belongs to the class C, then x has the property P (necessary condition), and if x has the property P, then it belongs to the class C (sufficient condition). Two complementary data mining methods allow us to materialize the discovery of definitions, the search for association rules and the search for redescriptions. In this thesis, we first present a state of the art about association rules and redescriptions. Next, we propose an adaptation of each data mining method for the task of definition discovery. Then we detail a set of experiments applied to DBpedia, and we qualitatively and quantitatively compare the two approaches. Finally, we discuss how discovered definitions can be added to DBpedia to improve its quality in terms of consistency and completeness
Ahmadi, Naser. "A framework for the continuous curation of a knowledge base system." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS320.
Full textEntity-centric knowledge graphs (KGs) are becoming increasingly popular for gathering information about entities. The schemas of KGs are semantically rich, with many different types and predicates to define the entities and their relationships. These KGs contain knowledge that requires understanding of the KG’s structure and patterns to be exploited. Their rich data structure can express entities with semantic types and relationships, oftentimes domain-specific, that must be made explicit and understood to get the most out of the data. Although different applications can benefit from such rich structure, this comes at a price. A significant challenge with KGs is the quality of their data. Without high-quality data, the applications cannot use the KG. However, as a result of the automatic creation and update of KGs, there are a lot of noisy and inconsistent data in them and, because of the large number of triples in a KG, manual validation is impossible. In this thesis, we present different tools that can be utilized in the process of continuous creation and curation of KGs. We first present an approach designed to create a KG in the accounting field by matching entities. We then introduce methods for the continuous curation of KGs. We present an algorithm for conditional rule mining and apply it on large graphs. Next, we describe RuleHub, an extensible corpus of rules for public KGs which provides functionalities for the archival and the retrieval of rules. We also report methods for using logical rules in two different applications: teaching soft rules to pre-trained language models (RuleBert) and explainable fact checking (ExpClaim)
Cherfi, Hacène. "Etude et réalisation d'un système d'extraction de connaissances à partir de textes." Phd thesis, Université Henri Poincaré - Nancy I, 2004. http://tel.archives-ouvertes.fr/tel-00011195.
Full textL'utilisation d'un modèle de connaissances vient appuyer et surtout compléter cette première approche. Il est montré, par la définition d'une mesure de vraisemblance, l'intérêt de découvrir de nouvelles connaissances en écartant les connaissances déjà répertoriées et décrites par un modèle de connaissances du domaine. Les règles d'association peuvent donc être utilisées pour alimenter un modèle de connaissances terminologiques du domaine des textes choisi. La thèse inclut la réalisation d'un système appelé TAMIS : "Text Analysis by Mining Interesting ruleS" ainsi qu'une expérimentation et une validation sur des données réelles de résumés de textes en biologie moléculaire.
Salem, Rashed. "Active XML Data Warehouses for Intelligent, On-line Decision Support." Thesis, Lyon 2, 2012. http://www.theses.fr/2012LYO22002.
Full textA decision support system (DSS) is an information system that supports decisionmakers involved in complex decision-making processes. Modern DSSs needto exploit data that are not only numerical or symbolic, but also heterogeneouslystructured (e.g., text and multimedia data) and coming from various sources (e.g,the Web). We term such data complex data. Data warehouses are casually usedas the basis of such DSSs. They help integrate data from a variety of sourcesto support decision-making. However, the advent of complex data imposes anothervision of data warehousing including data integration, data storage and dataanalysis. Moreover, today's requirements impose integrating complex data in nearreal-time rather than with traditional snapshot and batch ETL (Extraction, Transformationand Loading). Real-time and near real-time processing requires a moreactive ETL process. Data integration tasks must react in an intelligent, i.e., activeand autonomous way, to encountered changes in the data integration environment,especially data sources.In this dissertation, we propose novel solutions for complex data integration innear real-time, actively and autonomously. We indeed provide a generic metadatabased,service-oriented and event-driven approach for integrating complex data.To address data complexity issues, our approach stores heterogeneous data into aunied format using a metadata-based approach and XML. We also tackle datadistribution and interoperability using a service-oriented approach. Moreover, toaddress near real-time requirements, our approach stores not only integrated datainto a unied repository, but also functions to integrate data on-the-y. We also apply a service-oriented approach to track relevant data changes in near real-time.Furthermore, the idea of integrating complex data actively and autonomously revolvesaround mining logged events of data integration environment. For this sake,we propose an incremental XML-based algorithm for mining association rules fromlogged events. Then, we de ne active rules upon mined data to reactivate integrationtasks.To validate our approach for managing complex data integration, we develop ahigh-level software framework, namely AX-InCoDa (Active XML-based frameworkfor Integrating Complex Data). AX-InCoDa is implemented as Web application usingopen-source tools. It exploits Web standards (e.g., XML and Web services) andActive XML to handle complexity issues and near real-time requirements. Besidewarehousing logged events into an event repository to be mined for self-managingpurposes, AX-InCoDa is enriched with active rules. AX-InCoDa's feasibility is illustratedby a healthcare case study. Finally, the performance of our incremental eventmining algorithm is experimentally demonstrated
Wajnberg, Mickaël. "Analyse relationnelle de concepts : une méthode polyvalente pour l'extraction de connaissances." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0136.
Full textAt a time where data, often interpreted as "ground truth", are produced in gigantic quantities, a need for understanding and interpretability emerges in parallel. Dataset are nowadays mainly relational, therefore developping methods that allows relevant information extraction describing both objects and relation among them is a necessity. Association rules, along with their support and confidence metrics, describe co-occurrences of object features, hence explicitly express and evaluate any information contained in a dataset. In this thesis, we present and develop the relational concept analysis approach to extract the association rules that translate objects proper features along with the links with sets of objects. A first part present the mathematical part of the method, while a second part highlights three case studies to assess the pertinence of such a development. Case studies cover various domains to demonstrate the method polyvalence: the first case deals with error analysis in industrial production, the second covers psycholinguistics for dictionary analysis and the last one shows the method application in knowledge engineering
Fiot, Céline. "Extraction de séquences fréquentes : des données numériques aux valeurs manquantes." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2007. http://tel.archives-ouvertes.fr/tel-00179506.
Full textFiot, Céline. "Extraction de séquences fréquentes : des données numériques aux valeurs manquantes." Phd thesis, Montpellier 2, 2007. http://www.theses.fr/2007MON20056.
Full textLamirel, Jean-Charles. "Vers une approche systémique et multivues pour l'analyse de données et la recherche d'information : un nouveau paradigme." Habilitation à diriger des recherches, Université Nancy II, 2010. http://tel.archives-ouvertes.fr/tel-00552247.
Full text