Tesis sobre el tema "Fouille de données hybride"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Fouille de données hybride".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Shahzad, Atif. "Une Approche Hybride de Simulation-Optimisation Basée sur la fouille de Données pour les problèmes d'ordonnancement". Phd thesis, Université de Nantes, 2011. http://tel.archives-ouvertes.fr/tel-00647353.
Texto completoShahzad, Muhammad Atif. "Une approche hybride de simulation-optimisation basée sur la fouille de données pour les problèmes d'ordonnancement". Nantes, 2011. http://archive.bu.univ-nantes.fr/pollux/show.action?id=53c8638a-977a-4b85-8c12-6dc88d92f372.
Texto completoA data mining based approach to discover previously unknown priority dispatching rules for job shop scheduling problem is presented. This approach is based upon seeking the knowledge that is assumed to be embedded in the efficient solutions provided by the optimization module built using tabu search. The objective is to discover the scheduling concepts using data mining and hence to obtain a set of rules capable of approximating the efficient solutions for a job shop scheduling problem (JSSP). A data mining based scheduling framework is presented and implemented for a job shop problem with maximum lateness and mean tardiness as the scheduling objectives. The results obtained are very promising
Theobald, Claire. "Bayesian Deep Learning for Mining and Analyzing Astronomical Data". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0081.
Texto completoIn this thesis, we address the issue of trust in deep learning predictive systems in two complementary research directions. The first line of research focuses on the ability of AI to estimate its level of uncertainty in its decision-making as accurately as possible. The second line, on the other hand, focuses on the explainability of these systems, that is, their ability to convince human users of the soundness of their predictions.The problem of estimating the uncertainties is addressed from the perspective of Bayesian Deep Learning. Bayesian Neural Networks assume a probability distribution over their parameters, which allows them to estimate different types of uncertainties. First, aleatoric uncertainty which is related to the data, but also epistemic uncertainty which quantifies the lack of knowledge the model has on the data distribution. More specifically, this thesis proposes a Bayesian neural network can estimate these uncertainties in the context of a multivariate regression task. This model is applied to the regression of complex ellipticities on galaxy images as part of the ANR project "AstroDeep''. These images can be corrupted by different sources of perturbation and noise which can be reliably estimated by the different uncertainties. The exploitation of these uncertainties is then extended to galaxy mapping and then to "coaching'' the Bayesian neural network. This last technique consists of generating increasingly complex data during the model's training process to improve its performance.On the other hand, the problem of explainability is approached from the perspective of counterfactual explanations. These explanations consist of identifying what changes to the input parameters would have led to a different prediction. Our contribution in this field is based on the generation of counterfactual explanations relying on a variational autoencoder (VAE) and an ensemble of predictors trained on the latent space generated by the VAE. This method is particularly adapted to high-dimensional data, such as images. In this case, they are referred as counterfactual visual explanations. By exploiting both the latent space and the ensemble of classifiers, we can efficiently produce visual counterfactual explanations that reach a higher degree of realism than several state-of-the-art methods
Boudane, Abdelhamid. "Fouille de données par contraintes". Thesis, Artois, 2018. http://www.theses.fr/2018ARTO0403/document.
Texto completoIn this thesis, We adress the well-known clustering and association rules mining problems. Our first contribution introduces a new clustering framework, where complex objects are described by propositional formulas. First, we extend the two well-known k-means and hierarchical agglomerative clustering techniques to deal with these complex objects. Second, we introduce a new divisive algorithm for clustering objects represented explicitly by sets of models. Finally, we propose a propositional satisfiability based encoding of the problem of clustering propositional formulas without the need for an explicit representation of their models. In a second contribution, we propose a new propositional satisfiability based approach to mine association rules in a single step. The task is modeled as a propositional formula whose models correspond to the rules to be mined. To highlight the flexibility of our proposed framework, we also address other variants, namely the closed, minimal non-redundant, most general and indirect association rules mining tasks. Experiments on many datasets show that on the majority of the considered association rules mining tasks, our declarative approach achieves better performance than the state-of-the-art specialized techniques
Cohen, Jérémy E. "Fouille de données tensorielles environnementales". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAT054/document.
Texto completoAmong commonly used data mining techniques, few are those which are able to take advantage of the multiway structure of data in the form of a multiway array. In contrast, tensor decomposition techniques specifically look intricate processes underlying the data, where each of these processes can be used to describe all ways of the data array. The work reported in the following pages aims at incorporating various external knowledge into the tensor canonical polyadic decomposition, which is usually understood as a blind model. The first two chapters of this manuscript introduce tensor decomposition techniques making use respectively of a mathematical and application framework. In the third chapter, the many faces of constrained decompositions are explored, including a unifying framework for constrained decomposition, some decomposition algorithms, compression and dictionary-based tensor decomposition. The fourth chapter discusses the inclusion of subject variability modeling when multiple arrays of data are available stemming from one or multiple subjects sharing similarities. State of the art techniques are studied and expressed as particular cases of a more general flexible coupling model later introduced. The chapter ends on a discussion on dimensionality reduction when subject variability is involved, as well a some open problems
Turmeaux, Teddy. "Contraintes et fouille de données". Orléans, 2004. http://www.theses.fr/2004ORLE2048.
Texto completoPrudhomme, Elie. "Représentation et fouille de données volumineuses". Thesis, Lyon 2, 2009. http://www.theses.fr/2009LYO20048/document.
Texto completo/
Braud, Agnès. "Fouille de données par algorithmes génétiques". Orléans, 2002. http://www.theses.fr/2002ORLE2011.
Texto completoFrancisci, Dominique. "Techniques d'optimisation pour la fouille de données". Phd thesis, Université de Nice Sophia-Antipolis, 2004. http://tel.archives-ouvertes.fr/tel-00216131.
Texto completoCollard, Martine. "Fouille de données, Contributions Méthodologiques et Applicatives". Habilitation à diriger des recherches, Université Nice Sophia Antipolis, 2003. http://tel.archives-ouvertes.fr/tel-01059407.
Texto completoLhote, Loïck. "l'algorithmique: la fouille de données et l'arithmétique". Phd thesis, Université de Caen, 2006. http://tel.archives-ouvertes.fr/tel-00092862.
Texto completoKaroui, Lobna. "Extraction contextuelle d'ontologie par fouille de données". Paris 11, 2008. http://www.theses.fr/2008PA112220.
Texto completoLahbib, Dhafer. "Préparation non paramétrique des données pour la fouille de données multi-tables". Phd thesis, Université de Cergy Pontoise, 2012. http://tel.archives-ouvertes.fr/tel-00854142.
Texto completoClech, Jérémie. "Contribution méthodologique à la fouille de données complexes". Lyon 2, 2004. http://theses.univ-lyon2.fr/documents/lyon2/2004/clech_j.
Texto completoLaurent, Anne. "Bases de données multidimensionnelles floues et leur utilisation pour la fouille de données". Paris 6, 2002. http://www.theses.fr/2002PA066426.
Texto completoEl, Mahrsi Mohamed Khalil. "Analyse et fouille de données de trajectoires d'objets mobiles". Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0056/document.
Texto completoIn this thesis, we explore two problems related to managing and mining moving object trajectories. First, we study the problem of sampling trajectory data streams. Storing the entirety of the trajectories provided by modern location-aware devices can entail severe storage and processing overheads. Therefore, adapted sampling techniques are necessary in order to discard unneeded positions and reduce the size of the trajectories while still preserving their key spatiotemporal features. In streaming environments, this process needs to be conducted "on-the-fly" since the data are transient and arrive continuously. To this end, we introduce a new sampling algorithm called spatiotemporal stream sampling (STSS). This algorithm is computationally-efficient and guarantees an upper bound for the approximation error introduced during the sampling process. Experimental results show that stss achieves good performances and can compete with more sophisticated and costly approaches. The second problem we study is clustering trajectory data in road network environments. We present three approaches to clustering such data: the first approach discovers clusters of trajectories that traveled along the same parts of the road network; the second approach is segment-oriented and aims to group together road segments based on trajectories that they have in common; the third approach combines both aspects and simultaneously clusters trajectories and road segments. We show how these approaches can be used to reveal useful knowledge about flow dynamics and characterize traffic in road networks. We also provide experimental results where we evaluate the performances of our propositions
El, Mahrsi Mohamed Khalil. "Analyse et fouille de données de trajectoires d'objets mobiles". Electronic Thesis or Diss., Paris, ENST, 2013. http://www.theses.fr/2013ENST0056.
Texto completoIn this thesis, we explore two problems related to managing and mining moving object trajectories. First, we study the problem of sampling trajectory data streams. Storing the entirety of the trajectories provided by modern location-aware devices can entail severe storage and processing overheads. Therefore, adapted sampling techniques are necessary in order to discard unneeded positions and reduce the size of the trajectories while still preserving their key spatiotemporal features. In streaming environments, this process needs to be conducted "on-the-fly" since the data are transient and arrive continuously. To this end, we introduce a new sampling algorithm called spatiotemporal stream sampling (STSS). This algorithm is computationally-efficient and guarantees an upper bound for the approximation error introduced during the sampling process. Experimental results show that stss achieves good performances and can compete with more sophisticated and costly approaches. The second problem we study is clustering trajectory data in road network environments. We present three approaches to clustering such data: the first approach discovers clusters of trajectories that traveled along the same parts of the road network; the second approach is segment-oriented and aims to group together road segments based on trajectories that they have in common; the third approach combines both aspects and simultaneously clusters trajectories and road segments. We show how these approaches can be used to reveal useful knowledge about flow dynamics and characterize traffic in road networks. We also provide experimental results where we evaluate the performances of our propositions
Boullé, Marc. "Recherche d'une représentation des données efficace pour la fouille des grandes bases de données". Phd thesis, Télécom ParisTech, 2007. http://pastel.archives-ouvertes.fr/pastel-00003023.
Texto completoCharmpi, Konstantina. "Méthodes statistiques pour la fouille de données dans les bases de données de génomique". Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GRENM017/document.
Texto completoOur focus is on statistical testing methods, that compare a given vector of numeric values, indexed by all genes in the human genome, to a given set of genes, known to be associated to a particular type of cancer for instance. Among existing methods, Gene Set Enrichment Analysis is the most widely used. However it has several drawbacks. Firstly, the calculation of p-values is very much time consuming, and insufficiently precise. Secondly, like most other methods, it outputs a large number of significant results, the majority of which are not biologically meaningful. The two issues are addressed here, by two new statistical procedures, the Weighted and Doubly Weighted Kolmogorov-Smirnov tests. The two tests have been applied both to simulated and real data, and compared with other existing procedures. Our conclusion is that, beyond their mathematical and algorithmic advantages, the WKS and DWKS tests could be more informative in many cases, than the classical GSEA test and efficiently address the issues that have led to their construction
Aouiche, Kamel. "Techniques de fouille de données pour l'optimisation automatique des performances des entrepôts de données". Lyon 2, 2005. http://theses.univ-lyon2.fr/documents/lyon2/2005/aouiche_k.
Texto completoWith the development of databases in general and data warehouses in particular, it becomes very important to reduce the function of administration. The aim of auto-administrative systems is administrate and adapt themselves automatically, without loss or even with a gain in performance. The idea of using data mining techniques to extract useful knowledge for administration from the data themselves has been in the air for some years. However, no research has ever been achieved. As for as we know, it nevertheless remains a very promising approach, notably in the field of the data warehousing, where the queries are very heterogeneous and cannot be interpreted easily. The aim of this thesis is to study auto-administration techniques in databases and data warehouses, mainly performance optimization techniques such as indexing and view materialization, and to look for a way of extracting from stored data themselves useful knowledge to apply these techniques. We have designed a tool that finds an index and view configuration allowing to optimize data access time. Our tool searches frequent itemsets in a given workload and clusters the query workload to compute this index and view configuration. Finally, we have extended the performance optimization to XML data warehouses. In this area, we proposed an indexing technique that precomputes joins between XML facts and dimensions and adapted our materialized view selection strategy for XML materialized views
Jacquemont, Stéphanie. "Contributions de l'inférence grammaticale à la fouille de données séquentielles". Phd thesis, Université Jean Monnet - Saint-Etienne, 2008. http://tel.archives-ouvertes.fr/tel-00366358.
Texto completoDans ce contexte, nous avons montré que l'exploitation brute, non seulement des séquences d'origine mais aussi des automates probabilistes inférés à partir de celles-ci, ne garantit pas forcément une extraction de connaissance pertinente. Nous avons apporté dans cette thèse plusieurs contributions, sous la forme de bornes minimales et de contraintes statistiques, permettant ainsi d'assurer une exploitation fructueuse des séquences et des automates probabilistes. De plus, grâce à notre modèle nous apportons une solution efficace à certaines applications mettant en jeux des problèmes de préservation de vie privée des individus.
Ramstein, Gérard. "Application de techniques de fouille de données en Bio-informatique". Habilitation à diriger des recherches, Université de Nantes, 2012. http://tel.archives-ouvertes.fr/tel-00706566.
Texto completoKhiali, Lynda. "Fouille de données à partir de séries temporelles d’images satellites". Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS046/document.
Texto completoNowadays, remotely sensed images constitute a rich source of information that can be leveraged to support several applications including risk prevention, land use planning, land cover classification and many other several tasks. In this thesis, Satellite Image Time Series (SITS) are analysed to depict the dynamic of natural and semi-natural habitats. The objective is to identify, organize and highlight the evolution patterns of these areas.We introduce an object-oriented method to analyse SITS that consider segmented satellites images. Firstly, we identify the evolution profiles of the objects in the time series. Then, we analyse these profiles using machine learning methods. To identify the evolution profiles, we explore all the objects to select a subset of objects (spatio-temporal entities/reference objects) to be tracked. The evolution of the selected spatio-temporal entities is described using evolution graphs.To analyse these evolution graphs, we introduced three contributions. The first contribution explores annual SITS. It analyses the evolution graphs using clustering algorithms, to identify similar evolutions among the spatio-temporal entities. In the second contribution, we perform a multi-annual cross-site analysis. We consider several study areas described by multi-annual SITS. We use the clustering algorithms to identify intra and inter-site similarities. In the third contribution, we introduce à semi-supervised method based on constrained clustering. We propose a method to select the constraints that will be used to guide the clustering and adapt the results to the user needs.Our contributions were evaluated on several study areas. The experimental results allow to pinpoint relevant landscape evolutions in each study sites. We also identify the common evolutions among the different sites. In addition, the constraint selection method proposed in the constrained clustering allows to identify relevant entities. Thus, the results obtained using the unsupervised learning were improved and adapted to meet the user needs
Jollois, François-Xavier. "Contribution de la classification automatique à la fouille de données". Metz, 2003. http://docnum.univ-lorraine.fr/public/UPV-M/Theses/2003/Jollois.Francois_Xavier.SMZ0311.pdf.
Texto completoDo, Thanh-Nghi. "Visualisation et séparateurs à vaste marge en fouille de données". Nantes, 2004. http://www.theses.fr/2004NANT2072.
Texto completoWe present the different cooperative approaches using visualization methods and support vector machine algorithms (SVM) for knowledge discovery in databases (KDD). Most of existing data mining approaches construct the model in an automatic way, the user is not involved in the mining process. Furthermore, these approaches must be able to deal with the challenge of large datasets. Our work aims at increasing the human role in the KDD process (by the way of visualization methods) and improve the performances (concerning the execution time and the memory requirement) of the methods for mining large datasets. W e present:- parallel and distributed SVM algorithms for mining massive datasets, - interactive graphical methods to explain SVM results, - cooperative approaches to involve more significatively the user in the model construction
Dalloux, Clément. "Fouille de texte et extraction d'informations dans les données cliniques". Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S050.
Texto completoWith the introduction of clinical data warehouses, more and more health data are available for research purposes. While a significant part of these data exist in structured form, much of the information contained in electronic health records is available in free text form that can be used for many tasks. In this manuscript, two tasks are explored: the multi-label classification of clinical texts and the detection of negation and uncertainty. The first is studied in cooperation with the Rennes University Hospital, owner of the clinical texts that we use, while, for the second, we use publicly available biomedical texts that we annotate and release free of charge. In order to solve these tasks, we propose several approaches based mainly on deep learning algorithms, used in supervised and unsupervised learning situations
Mondal, Kartick Chandra. "Algorithmes pour la fouille de données et la bio-informatique". Thesis, Nice, 2013. http://www.theses.fr/2013NICE4049.
Texto completoKnowledge pattern extraction is one of the major topics in the data mining and background knowledge integration domains. Out of several data mining techniques, association rule mining and bi-clustering are two major complementary tasks for these topics. These tasks gained much importance in many domains in recent years. However, no approach was proposed to perform them in one process. This poses the problems of resources required (memory, execution times and data accesses) to perform independent extractions and of the unification of the different results. We propose an original approach for extracting different categories of knowledge patterns while using minimum resources. This approach is based on the frequent closed patterns theoretical framework and uses a novel suffix-tree based data structure to extract conceptual minimal representations of association rules, bi-clusters and classification rules. These patterns extend the classical frameworks of association and classification rules, and bi-clusters as data objects supporting each pattern and hierarchical relationships between patterns are also extracted. This approach was applied to the analysis of HIV-1 and human protein-protein interaction data. Analyzing such inter-species protein interactions is a recent major challenge in computational biology. Databases integrating heterogeneous interaction information and biological background knowledge on proteins have been constructed. Experimental results show that the proposed approach can efficiently process these databases and that extracted conceptual patterns can help the understanding and analysis of the nature of relationships between interacting proteins
Muhlenbach, Fabrice. "Evaluation de la qualité de la représentation en fouille de données". Lyon 2, 2002. http://demeter.univ-lyon2.fr:8080/sdx/theses/lyon2/2002/muhlenbach_f.
Texto completoKnowledge discovery tries to produce novel and usable knowledge from the databases. In this whole process, data mining is the crucial machine learning step but we must asked some questions first: how can we have an a priori idea of the way of the labels of the class attribute are separable or not? How can we deal with databases where some examples are mislabeled? How can we transform continuous predictive attributes in discrete ones in a supervised way by taking into account the global information of the data ? We propose some responses to these problems. Our solutions take advantage of the properties of geometrical tools: the neighbourhood graphs. The neighbourhood between examples projected in a multidimensional space gives us a way of characterising the likeness between the examples to learn. We develop a statistical test based on the weight of edges that we must suppress from a neighbourhood graph for having only subgraphs of a unique class. This gives information about the a priori class separability. This work is carried on in the context of the detection of examples from a database that have doubtful labels: we propose a strategy for removing and relabeling these doubtful examples from the learning set to improve the quality of the resulting predictive model. These researches are extended in the special case of a continuous class to learn: we present a structure test to predict this kind of variable. Finally, we present a supervised polythetic discretization method based on the neighbourhood graphs and we show its performances by using it with a new supervised machine learning algorithm
Liu, Xueliang. "Fouille d'informations multimédia partagées orienté événements". Electronic Thesis or Diss., Paris, ENST, 2012. http://www.theses.fr/2012ENST0071.
Texto completoThe exponential growth of social media data requires scalable, effective and robust technologies to manage and index them. Event is one of the most important cues to recall people’s past memory. With the development of Web 2.0, many event-based information sharing sites are appearing online, and a wide variety of events are scheduled and described by several social online services. The study of the relation between social media and events could leverage the event domain knowledge and ontologies to formulate the raised problems, and it could also exploit multimodal features to mine the patterns deeply, hence gain better performance compared with some other methods. In this thesis, we study the problem of mining relations between events and social media data. There are mainly three problems that are well investigated. The first problem is event enrichment, in which we investigate how to leverage the social media to events illustration. The second problem is event discovery, which focuses on discovering event patterns from social media stream. We propose burst detection and topic model based methods to find events from the spatial and temporal labeled social media. The third problem is visual event modeling, which studies the problem of automatically collecting training samples to model the visualization of events. The solution of collecting both of the positive and negative samples is also derived from the analysis of social media context. Thanks to the approaches proposed in this thesis, the intrinsic relationship between social media and events are deeply investigated, which provides a way to explore and organize online medias effectively
Berasaluce, Sandra. "Fouille de données et acquisition de connaissances à partir de bases de données de réactions chimiques". Nancy 1, 2002. http://docnum.univ-lorraine.fr/public/SCD_T_2002_0266_BERASALUCE.pdf.
Texto completoChemical reaction database, indispensable tools for synthetic chemists, are not free from flaws. In this thesis, we have tried to overcome the databases limits by adding knowledge which structures data. This allows us to consider new efficient modes for query these databases. In the end, the goal is to design systems having both functionalities of DB and KBS. In the knowledge acquisition process, we emphasized on the modelling of chemical objects. Thus, we were interested in synthetic methods which we have described in terms of synthetic objectives. Afterward, we based ourselves on the elaborated model to apply data mining techniques and to extract knowledge from chemical reaction databases. The experiments we have done on Resyn Assistant concerned the synthetic methods which construct monocycles and the functional interchanges and gave trends in good agreement with the domain knowledge
Dumont, Jerome. "Fouille de dynamiques multivariées, application à des données temporelles en cardiologie". Phd thesis, Université Rennes 1, 2008. http://tel.archives-ouvertes.fr/tel-00364720.
Texto completoSzathmary, Laszlo. "Méthodes symboliques de fouille de données avec la plate-forme Coron". Phd thesis, Université Henri Poincaré - Nancy I, 2006. http://tel.archives-ouvertes.fr/tel-00336374.
Texto completoLes contributions principales de cette thèse sont : (1) nous avons développé et adapté des algorithmes pour trouver les règles d'association minimales non-redondantes ; (2) nous avons défini une nouvelle base pour les règles d'associations appelée “règles fermées” ; (3) nous avons étudié un champ de l'ECBD important mais relativement peu étudié, à savoir l'extraction des motifs rares et des règles d'association rares ; (4) nous avons regroupé nos algorithmes et une collection d'autres algorithmes ainsi que d'autres opérations auxiliaires d'ECBD dans une boîte à outils logicielle appelée Coron.
Stattner, Erick. "Contributions à l'étude des réseaux sociaux : propagation, fouille, collecte de données". Phd thesis, Université des Antilles-Guyane, 2012. http://tel.archives-ouvertes.fr/tel-00830882.
Texto completoDa, Costa David. "Visualisation et fouille interactive de données à base de points d'intérêts". Tours, 2007. http://www.theses.fr/2007TOUR4021.
Texto completoIn this thesis, we present the problem of the visual data mining. We generally notice that it is specific to the types of data and that it is necessary to spend a long time to analyze the results in order to obtain an answer on the aspect of data. In this thesis, we have developed an interactive visualization environment for data exploration using points of interest. This tool visualizes all types of data and is generic because it uses only one similarity measure. These methods must be able to deal with large data sets. We also sought to improve the performances of our visualization algorithms, thus we managed to represent one million data. We also extended our tool to the data clustering. Most existing data clustering methods work in an automatic way, the user is not implied iin the process. We try to involve more significantly the user role in the data clustering process in order to improve his comprehensibility of the data results
Chaibi, Amine. "Contribution en apprentissage topologique non supervisé pour la fouille de données". Paris 13, 2013. http://scbd-sto.univ-paris13.fr/secure/edgalilee_th_2013_chaibi.pdf.
Texto completoThe research outlined in this thesis concern the development of approaches based on self-organizing maps for the groups-outliers and novelty detection, bi-clustering and confidence intervals estimation. For each problem, an unsupervised learning model is proposed. The first model that we propose in this thesis is dedicated to groups-outliers detection by proposing a new measure nammed GOF (Group Outlier Factor), which is estimated by the unsupervised learning. We integrated it to topological maps learning. Our approach is based on the density of each group of data, and simultaneously provides a data partitioning and a quantitative indicator (GOF) that indicat the "outlier-ness" of each cluster or group. Thereafter, the GOF measure is used as a classifier for novelty detection problem. In fact, we develop an approach based on GOF which automatically detects the new data that were not known during the learning process. The second model developed in this thesis is related to bi-clustering problemtitled BiTM (Bi-clustering using Topological Map). BiTM is based on self-organizing maps and provides a simultaneous clustering of rows and columns of the data matrix in order to increase the homogeneity of bi-clusters by respecting neighborhood relationship and using a single map. BiTM maps provide a new topological visualization of the bi-clusters. The third contribution is addressed to the confidence intervals estimation problem in time series. The Anticipeo company offers a solution that allows to perform detailed forecasts for different customers. In addition to its standard solution, we have developed a complementary tool for confidence intervals estimation and products classification according to their statistical characteristics. In this thesis, we have used different evaluation using performance measure and visualizations. The obtained results are encouraging and promising to continu in this direction
Fangseu, Badjio Edwige P. "Evaluation qualitative et guidage des utilisateurs en fouille visuelle de données". Lyon 2, 2005. http://theses.univ-lyon2.fr/documents/lyon2/2005/fangseubadjio_ep.
Texto completoThe research context of these works is the visual data mining domain and more precisely supervised data classification. Other related fields are: knowledge extraction in the data, machine learning, quality of interface, software ergonomic, software engineering and human machine interaction. The result provided by a visual data mining tool is a data model. Generally, in order to access the quality of visual data mining tools, there is an estimation of the rate of bad classification. We believe that, this estimation is necessary but not sufficient for the evaluation of visual data mining tools. In fact, this type of tools use interfaces, graphical representations, data sets and require the participation of the end-users. On the basis of a state of the art on visualization, visual data mining and software quality, we propose two analysis and evaluation methods: an inspection method for experts and a diagnosis method which can be used by end-users for analysis and quality evaluation that takes account of the specificities of the treated domain. We developed guidelines and quality criteria (measures and metrics) for the analysis and the diagnosis of the visual data mining tools. From the users' point of view, in order to use information relating to their profiles and their preferences throughout the mining process, we also proposed a user model of visual data mining tools. Case studies performed with the proposed diagnosis method enable us to raise other problems than those resulting from the estimation of the rate of bad classification. This work presents also solutions brought to two problems listed during the analysis and the diagnosis of some existing visual data mining tools: the choice of the best algorithm to perform for a supervised classification task and the pre-treatment of very large data sets. We considered the problem of the choice of the best classification algorithm as a multi criteria decision problem. Artificial intelligence allows bringing solutions to the multi criteria analysis. We use the results coming from this domain through the multi-agents paradigm and the case based reasoning to propose a list of algorithms of decreasing effectiveness for the resolution of a given problem and to evolve knowledge of the case base. For the treatment of very large data sets, the limits of visual approaches concerning the number of records and the number of attributes are known. To be able to treat these data sets, a solution is to perform a pre-treatment of the data set before applying the interactive algorithm. The reduction of the number of records is performed by the application of a clustering algorithm, the reduction of the number of attributes is done by the combination of the results of feature selection algorithms by applying the consensus theory (with a visual weight assignment tool). We evaluate the performances of our new approaches on data sets of the UCI and the Kent Ridge Bio Medical Dataset Repository
Fu, Huaiguo. "Algorithmique des treillis de concepts : application à la fouille de données". Artois, 2005. http://www.theses.fr/2005ARTO0401.
Texto completoOur main concern in this thesis is concept (or galois) lattices and its application to data mining. We achieve a comparison of different concept lattices algorithms on benchmarks taken from UCI. During this comparison, we analyse the duality phenomenon between objects and attributes on each algorithm performance. This analysis allows to show that the running time of an algorithm may considerably vary when using the formal context or the transposed context. Using the Divide-and-Conquer paradigm, we design a new concept lattice algorithm, ScalingNextClosure, which decomposes the search space in many partitions and builds formal concepts for each partition independently. By reducing the search space, ScalingNextClosure can deal efficiently with few memory space and thus treat huge formal context, but only if the whole context can be loaded in the memory. An experimental comparison between NextClosure and ScalingNextClosure shows the efficiency of such decomposition approach. In any huge dataset, ScalingNextClosure runs faster than NextClosure on a sequential machine, with an average win factor equal to 10. Another advantage of ScalingNextClosure is that it can be easily implemented on a distributed or parallel architecture. Mining frequent closed itemsets (FCI) is a subproblem of mining association rules. We adapt ScalingNextClosure to mine frequent closed itemsets, and design a new algorithm, called PFC. PFC uses the support measure to prune the search space within one partition. An experimental comparison conducted on a sequential architecture, between PFC with one of the efficient FCI system, is discussed
Dumont, Jérôme. "Fouille de dynamiques multivariées : application à des données temporelles en cardiologie". Rennes 1, 2008. http://www.theses.fr/2008REN1S078.
Texto completoThis manuscript focuses on the problem of analysing dynamics of time series observed in cardiology. The proposed solution is divided into two steps. The first one consists in the extraction of useful information from the ECG by segmenting each beat with a wavelet decomposition algorithmn, adapted from the litterature. The difficult problem of optimising both thresholds and time windows is solved with evolutionary algorithms. The second step relies on Hidden Semi-Markovian models to represent the time series made up of the extracted variables. An algorithm of unsupervised classification is proposed to retrieve the natural groups. The application of this method to the detection of ischemic episodes and to the analysis of stress ECG from patients suffering from Brugada syndrome presents a higher performance than more tradionnal approaches
Kharrat, Ahmed. "Fouille de données spatio-temporelles appliquée aux trajectoires dans un réseau". Versailles-St Quentin en Yvelines, 2013. http://www.theses.fr/2013VERS0042.
Texto completoRecent years have seen the development of data mining techniques for many application areas in order to analyze large and complex data. At the same time, the increasing deployment of location-acquisition technologies such as GPS, leads to produce a large datasets of geolocation traces. In this thesis, we are interested in mining trajectories of moving objects, such as vehicles in the road network. We propose a method for discovering dense routes by clustering similar road sections according to both traffic and location in each time period. The traffic estimation is based on the collected spatio-temporal trajectories. We also propose a characterization approach of the temporal evolution of dense routes by a graph connecting dense routes over consecutive time periods. This graph is labelled by a degree of evolution. Our last proposal concerns the discovery of mobility patterns and using these patterns to define a new representation of generalised trajectories
Belghiti, Moulay Tayeb. "Modélisation et techniques d'optimisation en bio-informatique et fouille de données". Thesis, Rouen, INSA, 2008. http://www.theses.fr/2008ISAM0002.
Texto completoThis Ph.D. thesis is particularly intended to treat two types of problems : clustering and the multiple alignment of sequence. Our objective is to solve efficiently these global problems and to test DC Programming approach and DCA on real datasets. The thesis is divided into three parts : the first part is devoted to the new approaches of nonconvex optimization-global optimization. We present it a study in depth of the algorithm which is used in this thesis, namely the programming DC and the algorithm DC ( DCA). In the second part, we will model the problem clustering in three nonconvex subproblems. The first two subproblems are distinguished compared to the choice from the norm used, (clustering via norm 1 and 2). The third subproblem uses the method of the kernel, (clustering via the method of the kernel). The third part will be devoted to bioinformatics, one goes this focused on the modeling and the resolution of two subproblems : the multiple alignment of sequence and the alignment of sequence of RNA. All the chapters except the first end in numerical tests
Oudni, Amal. "Fouille de données par extraction de motifs graduels : contextualisation et enrichissement". Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066437/document.
Texto completoThis thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription
Oudni, Amal. "Fouille de données par extraction de motifs graduels : contextualisation et enrichissement". Electronic Thesis or Diss., Paris 6, 2014. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2014PA066437.pdf.
Texto completoThis thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription
Da, Silva Sébastien. "Fouille de données spatiales et modélisation de linéaires de paysages agricoles". Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0156/document.
Texto completoThis thesis is part of a partnership between INRA and INRIA in the field of knowledge extraction from spatial databases. The study focuses on the characterization and simulation of agricultural landscapes. More specifically, we focus on linears that structure the agricultural landscape, such as roads, irrigation ditches and hedgerows. Our goal is to model the spatial distribution of hedgerows because of their role in many ecological and environmental processes. We more specifically study how to characterize the spatial structure of hedgerows in two contrasting agricultural landscapes, one located in south-Eastern France (mainly composed of orchards) and the second in Brittany (western France, \emph{bocage}-Type). We determine if the spatial distribution of hedgerows is structured by the position of the more perennial linear landscape features, such as roads and ditches, or not. In such a case, we also detect the circumstances under which this spatial distribution is structured and the scale of these structures. The implementation of the process of Knowledge Discovery in Databases (KDD) is comprised of different preprocessing steps and data mining algorithms which combine mathematical and computational methods. The first part of the thesis focuses on the creation of a statistical spatial index, based on a geometric neighborhood concept and allowing the characterization of structures of hedgerows. Spatial index allows to describe the structures of hedgerows in the landscape. The results show that hedgerows depend on more permanent linear elements at short distances, and that their neighborhood is uniform beyond 150 meters. In addition different neighborhood structures have been identified depending on the orientation of hedgerows in the South-East of France but not in Brittany. The second part of the thesis explores the potential of coupling linearization methods with Markov methods. The linearization methods are based on the use of alternative Hilbert curves: Hilbert adaptive paths. The linearized spatial data thus constructed were then treated with Markov methods. These methods have the advantage of being able to serve both for the machine learning and for the generation of new data, for example in the context of the simulation of a landscape. The results show that the combination of these methods for learning and automatic generation of hedgerows captures some characteristics of the different study landscapes. The first simulations are encouraging despite the need for post-Processing. Finally, this work has enabled the creation of a spatial data mining method based on different tools that support all stages of a classic KDD, from the selection of data to the visualization of results. Furthermore, this method was constructed in such a way that it can also be used for data generation, a component necessary for the simulation of landscapes
Juniarta, Nyoman. "Fouille de données complexes et biclustering avec l'analyse formelle de concepts". Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0199.
Texto completoKnowledge discovery in database (KDD) is a process which is applied to possibly large volumes of data for discovering patterns which can be significant and useful. In this thesis, we are interested in data transformation and data mining in knowledge discovery applied to complex data, and we present several experiments related to different approaches and different data types. The first part of this thesis focuses on the task of biclustering using formal concept analysis (FCA) and pattern structures. FCA is naturally related to biclustering, where the objective is to simultaneously group rows and columns which verify some regularities. Related to FCA, pattern structures are its generalizations which work on more complex data. Partition pattern structures were proposed to discover constant-column biclustering, while interval pattern structures were studied in similar-column biclustering. Here we extend these approaches to enumerate other types of biclusters: additive, multiplicative, order-preserving, and coherent-sign-changes. The second part of this thesis focuses on two experiments in mining complex data. First, we present a contribution related to the CrossCult project, where we analyze a dataset of visitor trajectories in a museum. We apply sequence clustering and FCA-based sequential pattern mining to discover patterns in the dataset and to classify these trajectories. This analysis can be used within CrossCult project to build recommendation systems for future visitors. Second, we present our work related to the task of antibacterial drug discovery. The dataset for this task is generally a numerical matrix with molecules as rows and features/attributes as columns. The huge number of features makes it more complex for any classifier to perform molecule classification. Here we study a feature selection approach based on log-linear analysis which discovers associations among features. As a synthesis, this thesis presents a series of different experiments in the mining of complex real-world data
Da, Silva Sébastien. "Fouille de données spatiales et modélisation de linéaires de paysages agricoles". Electronic Thesis or Diss., Université de Lorraine, 2014. http://docnum.univ-lorraine.fr/prive/DDOC_T_2014_0156_DA_SILVA.pdf.
Texto completoThis thesis is part of a partnership between INRA and INRIA in the field of knowledge extraction from spatial databases. The study focuses on the characterization and simulation of agricultural landscapes. More specifically, we focus on linears that structure the agricultural landscape, such as roads, irrigation ditches and hedgerows. Our goal is to model the spatial distribution of hedgerows because of their role in many ecological and environmental processes. We more specifically study how to characterize the spatial structure of hedgerows in two contrasting agricultural landscapes, one located in south-Eastern France (mainly composed of orchards) and the second in Brittany (western France, \emph{bocage}-Type). We determine if the spatial distribution of hedgerows is structured by the position of the more perennial linear landscape features, such as roads and ditches, or not. In such a case, we also detect the circumstances under which this spatial distribution is structured and the scale of these structures. The implementation of the process of Knowledge Discovery in Databases (KDD) is comprised of different preprocessing steps and data mining algorithms which combine mathematical and computational methods. The first part of the thesis focuses on the creation of a statistical spatial index, based on a geometric neighborhood concept and allowing the characterization of structures of hedgerows. Spatial index allows to describe the structures of hedgerows in the landscape. The results show that hedgerows depend on more permanent linear elements at short distances, and that their neighborhood is uniform beyond 150 meters. In addition different neighborhood structures have been identified depending on the orientation of hedgerows in the South-East of France but not in Brittany. The second part of the thesis explores the potential of coupling linearization methods with Markov methods. The linearization methods are based on the use of alternative Hilbert curves: Hilbert adaptive paths. The linearized spatial data thus constructed were then treated with Markov methods. These methods have the advantage of being able to serve both for the machine learning and for the generation of new data, for example in the context of the simulation of a landscape. The results show that the combination of these methods for learning and automatic generation of hedgerows captures some characteristics of the different study landscapes. The first simulations are encouraging despite the need for post-Processing. Finally, this work has enabled the creation of a spatial data mining method based on different tools that support all stages of a classic KDD, from the selection of data to the visualization of results. Furthermore, this method was constructed in such a way that it can also be used for data generation, a component necessary for the simulation of landscapes
Ventura, Quentin. "Technique de visualisation hybride pour les données spatio-temporelles". Mémoire, École de technologie supérieure, 2014. http://espace.etsmtl.ca/1298/1/VENTURA_Quentin.pdf.
Texto completoLiu, Xueliang. "Fouille d'informations multimédia partagées orienté événements". Thesis, Paris, ENST, 2012. http://www.theses.fr/2012ENST0071/document.
Texto completoThe exponential growth of social media data requires scalable, effective and robust technologies to manage and index them. Event is one of the most important cues to recall people’s past memory. With the development of Web 2.0, many event-based information sharing sites are appearing online, and a wide variety of events are scheduled and described by several social online services. The study of the relation between social media and events could leverage the event domain knowledge and ontologies to formulate the raised problems, and it could also exploit multimodal features to mine the patterns deeply, hence gain better performance compared with some other methods. In this thesis, we study the problem of mining relations between events and social media data. There are mainly three problems that are well investigated. The first problem is event enrichment, in which we investigate how to leverage the social media to events illustration. The second problem is event discovery, which focuses on discovering event patterns from social media stream. We propose burst detection and topic model based methods to find events from the spatial and temporal labeled social media. The third problem is visual event modeling, which studies the problem of automatically collecting training samples to model the visualization of events. The solution of collecting both of the positive and negative samples is also derived from the analysis of social media context. Thanks to the approaches proposed in this thesis, the intrinsic relationship between social media and events are deeply investigated, which provides a way to explore and organize online medias effectively
Laurent, Anne. "Fouille de données complexes et logique floue : extraction de motifs à partir de bases de données multidimensionnelles". Habilitation à diriger des recherches, Université Montpellier II - Sciences et Techniques du Languedoc, 2009. http://tel.archives-ouvertes.fr/tel-00413140.
Texto completoMouhoubi, Karima. "Extraction des motifs contraints dans des données bruitées". Paris 13, 2013. http://www.theses.fr/2013PA132060.
Texto completoKaba, Bangaly. "Décomposition de graphes comme outil de regroupement et de visualisation en fouille de données". Clermont-Ferrand 2, 2008. http://www.theses.fr/2008CLF21871.
Texto completo