Dissertations / Theses on the topic 'Exploration de données cachées'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Exploration de données cachées.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Hayat, Khizar. "Visualisation 3D adaptée par insertion synchronisée de données cachées." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2009. http://tel.archives-ouvertes.fr/tel-00400762.
Full textMeuel, Peter. "Insertion de données cachées dans des vidéos au format H. 264." Montpellier 2, 2009. http://www.theses.fr/2009MON20218.
Full textThis thesis targets two major issues cause by the massive adoption of the H. 264 video format: the privacy issue with closed-circuit television and the need of secure and robust watermarking methods for the video content. A first contribution adresses the privacy issue achieve the creation of a single video flow wich restraint the visual information of the filmed faces only to persons with the appropriate key. Performances of the results show the usability of the method in video-camera. The second contribution about the robust watermarking uses the secure watermarking state-of-the-art applied to video. On the opposite of crypting, the security of the method relies on the secret subspace for the insertion. The work explains the entire process for an adaptation to the H. 264 video format
Liu, Zhenjiao. "Incomplete multi-view data clustering with hidden data mining and fusion techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS011.
Full textIncomplete multi-view data clustering is a research direction that attracts attention in the fields of data mining and machine learning. In practical applications, we often face situations where only part of the modal data can be obtained or there are missing values. Data fusion is an important method for incomplete multi-view information mining. Solving incomplete multi-view information mining in a targeted manner, achieving flexible collaboration between visible views and shared hidden views, and improving the robustness have become quite challenging. This thesis focuses on three aspects: hidden data mining, collaborative fusion, and enhancing the robustness of clustering. The main contributions are as follows:1. Hidden data mining for incomplete multi-view data: existing algorithms cannot make full use of the observation of information within and between views, resulting in the loss of a large amount of valuable information, and so we propose a new incomplete multi-view clustering model IMC-NLT (Incomplete Multi-view Clustering Based on NMF and Low-Rank Tensor Fusion) based on non-negative matrix factorization and low-rank tensor fusion. IMC-NLT first uses a low-rank tensor to retain view features with a unified dimension. Using a consistency measure, IMC-NLT captures a consistent representation across multiple views. Finally, IMC-NLT incorporates multiple learning into a unified model such that hidden information can be extracted effectively from incomplete views. We conducted comprehensive experiments on five real-world datasets to validate the performance of IMC-NLT. The overall experimental results demonstrate that the proposed IMC-NLT performs better than several baseline methods, yielding stable and promising results.2. Collaborative fusion for incomplete multi-view data: our approach to address this issue is Incomplete Multi-view Co-Clustering by Sparse Low-Rank Representation (CCIM-SLR). The algorithm is based on sparse low-rank representation and subspace representation, in which jointly missing data is filled using data within a modality and related data from other modalities. To improve the stability of clustering results for multi-view data with different missing degrees, CCIM-SLR uses the Γ-norm model, which is an adjustable low-rank representation method. CCIM-SLR can alternate between learning the shared hidden view, visible view, and cluster partitions within a co-learning framework. An iterative algorithm with guaranteed convergence is used to optimize the proposed objective function. Compared with other baseline models, CCIM-SLR achieved the best performance in the comprehensive experiments on the five benchmark datasets, particularly on those with varying degrees of incompleteness.3. Enhancing the clustering robustness for incomplete multi-view data: we offer a fusion of graph convolution and information bottlenecks (Incomplete Multi-view Representation Learning Through Anchor Graph-based GCN and Information Bottleneck - IMRL-AGI). First, we introduce the information bottleneck theory to filter out the noise data with irrelevant details and retain only the most relevant feature items. Next, we integrate the graph structure information based on anchor points into the local graph information of the state fused into the shared information representation and the information representation learning process of the local specific view, a process that can balance the robustness of the learned features and improve the robustness. Finally, the model integrates multiple representations with the help of information bottlenecks, reducing the impact of redundant information in the data. Extensive experiments are conducted on several real-world datasets, and the results demonstrate the superiority of IMRL-AGI. Specifically, IMRL-AGI shows significant improvements in clustering and classification accuracy, even in the presence of high view missing rates (e.g. 10.23% and 24.1% respectively on the ORL dataset)
Rouan, Lauriane. "Apports des chaînes de Markov cachées à l'analyse de données de capture-recapture." Montpellier 2, 2007. http://www.theses.fr/2007MON20188.
Full textEng, Catherine. "Développement de méthodes de fouille de données basées sur les modèles de Markov cachés du second ordre pour l'identification d'hétérogénéités dans les génomes bactériens." Thesis, Nancy 1, 2010. http://www.theses.fr/2010NAN10041/document.
Full textSecond-order Hidden Markov Models (HMM2) are stochastic processes with a high efficiency in exploring bacterial genome sequences. Different types of HMM2 (M1M2, M2M2, M2M0) combined to combinatorial methods were developed in a new approach to discriminate genomic regions without a priori knowledge on their genetic content. This approach was applied on two bacterial models in order to validate its achievements: Streptomyces coelicolor and Streptococcus thermophilus. These bacterial species exhibit distinct genomic traits (base composition, global genome size) in relation with their ecological niche: soil for S. coelicolor and dairy products for S. thermophilus. In S. coelicolor, a first HMM2 architecture allowed the detection of short discrete DNA heterogeneities (5-16 nucleotides in size), mostly localized in intergenic regions. The application of the method on a biologically known gene set, the SigR regulon (involved in oxidative stress response), proved the efficiency in identifying bacterial promoters. S. coelicolor shows a complex regulatory network (up to 12% of the genes may be involved in gene regulation) with more than 60 sigma factors, involved in initiation of transcription. A classification method coupled to a searching algorithm (i.e. R’MES) was developed to automatically extract the box1-spacer-box2 composite DNA motifs, structure corresponding to the typical bacterial promoter -35/-10 boxes. Among the 814 DNA motifs described for the whole S. coelicolor genome, those of sigma factors (B, WhiG) could be retrieved from the crude data. We could show that this method could be generalized by applying it successfully in a preliminary attempt to the genome of Bacillus subtilis
Itier, Vincent. "Nouvelles méthodes de synchronisation de nuages de points 3D pour l'insertion de données cachées." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS017/document.
Full textThis thesis addresses issues relating to the protection of 3D object meshes. For instance, these objects can be created using CAD tool developed by the company STRATEGIES. In an industrial context, 3D meshes creators need to have tools in order to verify meshes integrity, or check permission for 3D printing for example.In this context we study data hiding on 3D meshes. This approach allows us to insert information in a secure and imperceptible way in a mesh. This may be an identifier, a meta-information or a third-party content, for instance, in order to transmit secretly a texture. Data hiding can address these problems by adjusting the trade-off between capacity, imperceptibility and robustness. Generally, data hiding methods consist of two stages, the synchronization and the embedding. The synchronization stage consists of finding and ordering available components for insertion. One of the main challenges is to propose an effective synchronization method that defines an order on mesh components. In our work, we propose to use mesh vertices, specifically their geometric representation in space, as basic components for synchronization and embedding. We present three new synchronisation methods based on the construction of a Hamiltonian path in a vertex cloud. Two of these methods jointly perform the synchronization stage and the embedding stage. This is possible thanks to two new high-capacity embedding methods (from 3 to 24 bits per vertex) that rely on coordinates quantization. In this work we also highlight the constraints of this kind of synchronization. We analyze the different approaches proposed with several experimental studies. Our work is assessed on various criteria including the capacity and imperceptibility of the embedding method. We also pay attention to security aspects of the proposed methods
Lazrak, El Ghali. "Fouille de données stochastique pour la compréhension des dynamiques temporelles et spatiales des territoires agricoles. Contribution à une agronomie numérique." Phd thesis, Université de Lorraine, 2012. http://tel.archives-ouvertes.fr/tel-00782768.
Full textNegre, Elsa. "Exploration collaborative de cubes de données." Thesis, Tours, 2009. http://www.theses.fr/2009TOUR4023/document.
Full textLes entrepôts de données stockent de gros volumes de données multidimensionnelles, consolidées et historisées dans le but d'être explorées et analysées par différents utilisateurs. L'exploration de données est un processus de recherche d'informations pertinentes au sein d'un ensemble de données. Dans le cadre de nos travaux, l'ensemble de données à explorer est un cube de données qui est un extrait de l'entrepôt de données que les utilisateurs interrogent en lançant des séquences de requêtes OLAP (On-Line Analytical Processing). Cependant, cette masse d'informations à explorer peut être très importante et variée, il est donc nécessaire d'aider l'utilisateur à y faire face en le guidant dans son exploration du cube de données afin qu'il trouve des informations pertinentes. Le travail présenté dans cette thèse a pour objectif de proposer des recommandations, sous forme de requêtes OLAP, à un utilisateur interrogeant un cube de données. Cette proposition tire parti de ce qu'ont fait les autres utilisateurs lors de leurs précédentes explorations du même cube de données. Nous commençons par présenter un aperçu du cadre et des techniques utilisés en Recherche d'Informations, Exploration des Usages du Web ou e-commerce. Puis, en nous inspirant de ce cadre, nous présentons un état de l'art sur l'aide à l'exploration des bases de données (relationnelles et multidimensionnelles). Cela nous permet de dégager des axes de travail dans le contexte des bases de données multidimensionnelles. Par la suite, nous proposons donc un cadre générique de génération de recommandations, générique dans le sens où les trois étapes du processus sont paramétrables. Ainsi, à partir d'un ensemble de séquences de requêtes, correspondant aux explorations du cube de données faites précédemment par différents utilisateurs, et de la séquence de requêtes de l'utilisateur courant, notre cadre propose un ensemble de requêtes pouvant faire suite à la séquence de requêtes courante. Puis, diverses instanciations de ce cadre sont proposées. Nous présentons ensuite un prototype écrit en Java. Il permet à un utilisateur de spécifier sa séquence de requêtes courante et lui renvoie un ensemble de recommandations. Ce prototype nous permet de valider notre approche et d'en vérifier l'efficacité avec un série d'expérimentations. Finalement, afin d'améliorer cette aide collaborative à l'exploration de cubes de données et de permettre, notamment, le partage de requêtes, la navigation au sein des requêtes posées sur le cube de données, ou encore de les annoter, nous proposons un cadre d'organisation de requêtes. Ainsi, une instanciation adaptée à la gestion des recommandations est présentée
Gaumer, Gaëtan. "Résumé de données en extraction de connaissances à partir des données (ECD) : application aux données relationnelles et textuelles." Nantes, 2003. http://www.theses.fr/2003NANT2025.
Full textMaitre, Julien. "Détection et analyse des signaux faibles. Développement d’un framework d’investigation numérique pour un service caché Lanceurs d’alerte." Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS020.
Full textThis manuscript provides the basis for a complete chain of document analysis for a whistleblower service, such as GlobalLeaks. We propose a chain of semi-automated analysis of text document and search using websearch queries to in fine present dashboards describing weak signals. We identify and solve methodological and technological barriers inherent to : 1) automated analysis of text document with minimum a priori information,2) enrichment of information using web search 3) data visualization dashboard and 3D interactive environment. These static and dynamic approaches are used in the context of data journalism for processing heterogeneous types of information within documents. This thesis also proposed a feasibility study and prototyping by the implementation of a processing chain in the form of a software. This construction requires a weak signal definition. Our goal is to provide configurable and generic tool. Our solution is based on two approaches : static and dynamic. In the static approach, we propose a solution requiring less intervention from the domain expert. In this context, we propose a new approach of multi-leveltopic modeling. This joint approach combines topic modeling, word embedding and an algorithm. The use of a expert helps to assess the relevance of the results and to identify topics with weak signals. In the dynamic approach, we integrate a solution for monitoring weak signals and we follow up to study their evolution. Wetherefore propose and agent mining solution which combines data mining and multi-agent system where agents representing documents and words are animated by attraction/repulsion forces. The results are presented in a data visualization dashboard and a 3D interactive environment in Unity. First, the static approach is evaluated in a proof-of-concept with synthetic and real text corpus. Second, the complete chain of document analysis (static and dynamic) is implemented in a software and are applied to data from document databases
El, Ghaziri Angélina. "Relation entre tableaux de données : exploration et prédiction." Thesis, Nantes, Ecole nationale vétérinaire, 2016. http://www.theses.fr/2016ONIR088F/document.
Full textThe research developed in this thesis deals with several statistical aspects for analyzing datasets. Firstly, investigations of the properties of several association indices commonly used by practitioners are undergone. Secondly, different strategies related to the standardization of the datasets with application to principal component analysis (PCA) and regression, especially PLS-regression were developed. The first strategy consists of a continuum standardization of the variables. The interest of such standardization in PCA and PLS-regression is emphasized.A more general standardization is also discussed which consists in reducing gradually not only the variances of the variables but also their correlations. Thereafter, a continuum approach was developed combining Redundancy Analysis and PLS-regression. Moreover, this new standardization inspired a biased regression model in multiple linear regression. Properties related to this approach are studied and the results are compared on the basis of case studies with those of Ridge regression. In the context of the analysis of several datasets in an exploratory perspective, the method called ComDim, has certainly raised interest among practitioners. An extension of this method for the analysis of K+1 datasets was developed. Properties related to this method, called P-ComDim, are studied and compared to Multiblock PLS. Finally, for the analysis of datasets depending on several factors, a new approach based on PLS regression is proposed
Rommel, Cédric. "Exploration de données pour l'optimisation de trajectoires aériennes." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLX066/document.
Full textThis thesis deals with the use of flight data for the optimization of climb trajectories with relation to fuel consumption.We first focus on methods for identifying the aircraft dynamics, in order to plug it in the trajectory optimization problem. We suggest a static formulation of the identification problem, which we interpret as a structured multi-task regression problem. In this framework, we propose parametric models and use different maximum likelihood approaches to learn the unknown parameters.Furthermore, polynomial models are considered and an extension to the structured multi-task setting of the bootstrap Lasso is used to make a consistent selection of the monomials despite the high correlations among them.Next, we consider the problem of assessing the optimized trajectories relatively to the validity region of the identified models. For this, we propose a probabilistic criterion for quantifying the closeness between an arbitrary curve and a set of trajectories sampled from the same stochastic process. We propose a class of estimators of this quantity and prove their consistency in some sense. A nonparemetric implementation based on kernel density estimators, as well as a parametric implementation based on Gaussian mixtures are presented. We introduce the later as a penalty term in the trajectory optimization problem, which allows us to control the trade-off between trajectory acceptability and consumption reduction
Saffar, Imen. "Vers une agentification de comportements observés : une approche originale basée sur l’apprentissage automatique pour la simulation d’un environnement réel." Thesis, Lille 1, 2013. http://www.theses.fr/2013LIL10190/document.
Full textThe design of simulation tools, which are able to reproduce the dynamics and evolution of complex real phenomena, is hard. Modeling these phenomena by analytical approaches is often unsuitable, forcing the designer to turn towards behavioral approaches. In this context, multi-agent simulations are now a credible alternative to the classical simulations. However, they remain difficult to implement. In fact, the designer of the simulation must be able to transcribe the dynamic of the phenomenon being observed in agents behavior. This step usually requires the skills of a specialist with some expertise in the phenomenon to be simulated. In this thesis, we propose a novel way to treat observing real behaviors to simulate, without resorting to the help of an expert.It is relying on unsupervised learning techniques to identify and extract behavior and facilitate the agentification. Our approach is, therefore, a step towards the automatic design of multi-agent simulations reproducing observable phenomena. This approach is motivated by an application context aiming the simulation of customers’ behavior within a retail space
Turmeaux, Teddy. "Contraintes et fouille de données." Orléans, 2004. http://www.theses.fr/2004ORLE2048.
Full textDjedaini, Mahfoud. "Automatic assessment of OLAP exploration quality." Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4038/document.
Full textIn a Big Data context, traditional data analysis is becoming more and more tedious. Many approaches have been designed and developed to support analysts in their exploration tasks. However, there is no automatic, unified method for evaluating the quality of support for these different approaches. Current benchmarks focus mainly on the evaluation of systems in terms of temporal, energy or financial performance. In this thesis, we propose a model, based on supervised automatic leaming methods, to evaluate the quality of an OLAP exploration. We use this model to build an evaluation benchmark of exploration support sys.terns, the general principle of which is to allow these systems to generate explorations and then to evaluate them through the explorations they produce
Syla, Burhan. "Relais de perte de synchronisme par exploration de données." Thesis, Université Laval, 2012. http://www.theses.ulaval.ca/2012/29102/29102.pdf.
Full textThe goal of this document is to verify the feasability of an out-of-step relay using data mining and decision trees. Using EMTP-RV and the Anderson network, 180 simulations were done while changing the place of the short circuit, the length, the type and the load-flow. For these simulations, 39 electrical measures and 8 mechanical measures were made. These simulations were then classified as stable or instable using the center of inertia of angle and speed. With MATLAB, 33 new other variables were created by using the first 39, and then with KNIME, decisions trees such as C4.5, CART, ADABoost, ADTree and random forest were simulated and the sampling time versus the performances were compared. Using Consistency Subset Eval, Symmetrical Uncert Attribute Set Eval and Correlation-based Feature Subset Selection, the features were reduced and the simulations were visualised using the validation set. Results show that with a sampling frequency of 240 [Hz] and 28 variables is enough to obtain a mean area under the curve of 0.9591 for the training and the validation set of the 4 generators.
Clech, Jérémie. "Contribution méthodologique à la fouille de données complexes." Lyon 2, 2004. http://theses.univ-lyon2.fr/documents/lyon2/2004/clech_j.
Full textEl, Golli Aicha. "Extraction de données symboliques et cartes topologiques : Application aux données ayant une structure complexe." Paris 9, 2004. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2004PA090026.
Full textSansen, Joris. "La visualisation d’information pour les données massives : une approche par l’abstraction de données." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0636/document.
Full textThe evolution and spread of technologies have led to a real explosion of information and our capacity to generate data and our need to analyze them have never been this strong. Still, the problems raised by such accumulation (storage, computation delays, diversity, speed of gathering/generation, etc. ) is as strong as the data are big, complex and varied. Information visualization,by its ability to summarize and abridge data was naturally established as appropriate approach. However, it does not solve the problem raised by Big Data. Actually, classical visualization techniques are rarely designed to handle such mass of information. Moreover, the problems raised by data storage and computation time have repercussions on the analysis system. For example,the increasing distance between the data and the analyst : the place where the data is stored and the place where the user will perform the analyses arerarely close. In this thesis, we focused on these issues and more particularly on adapting the information visualization techniques for Big Data. First of all focus on relational data : how does the existence of a relation between entity istransmitted and how to improve this transmission for hierarchical data. Then,we focus on multi-variate data and how to handle their complexity for the required computations. Finally, we present the methods we designed to make our techniques compatible with Big Data
Boullé, Marc. "Recherche d'une représentation des données efficace pour la fouille des grandes bases de données." Phd thesis, Télécom ParisTech, 2007. http://pastel.archives-ouvertes.fr/pastel-00003023.
Full textBraud, Agnès. "Fouille de données par algorithmes génétiques." Orléans, 2002. http://www.theses.fr/2002ORLE2011.
Full textAouiche, Kamel. "Techniques de fouille de données pour l'optimisation automatique des performances des entrepôts de données." Lyon 2, 2005. http://theses.univ-lyon2.fr/documents/lyon2/2005/aouiche_k.
Full textWith the development of databases in general and data warehouses in particular, it becomes very important to reduce the function of administration. The aim of auto-administrative systems is administrate and adapt themselves automatically, without loss or even with a gain in performance. The idea of using data mining techniques to extract useful knowledge for administration from the data themselves has been in the air for some years. However, no research has ever been achieved. As for as we know, it nevertheless remains a very promising approach, notably in the field of the data warehousing, where the queries are very heterogeneous and cannot be interpreted easily. The aim of this thesis is to study auto-administration techniques in databases and data warehouses, mainly performance optimization techniques such as indexing and view materialization, and to look for a way of extracting from stored data themselves useful knowledge to apply these techniques. We have designed a tool that finds an index and view configuration allowing to optimize data access time. Our tool searches frequent itemsets in a given workload and clusters the query workload to compute this index and view configuration. Finally, we have extended the performance optimization to XML data warehouses. In this area, we proposed an indexing technique that precomputes joins between XML facts and dimensions and adapted our materialized view selection strategy for XML materialized views
Do, Thanh-Nghi. "Visualisation et séparateurs à vaste marge en fouille de données." Nantes, 2004. http://www.theses.fr/2004NANT2072.
Full textWe present the different cooperative approaches using visualization methods and support vector machine algorithms (SVM) for knowledge discovery in databases (KDD). Most of existing data mining approaches construct the model in an automatic way, the user is not involved in the mining process. Furthermore, these approaches must be able to deal with the challenge of large datasets. Our work aims at increasing the human role in the KDD process (by the way of visualization methods) and improve the performances (concerning the execution time and the memory requirement) of the methods for mining large datasets. W e present:- parallel and distributed SVM algorithms for mining massive datasets, - interactive graphical methods to explain SVM results, - cooperative approaches to involve more significatively the user in the model construction
Legrand, Gaëlle. "Approche méthodologique de sélection et construction de variables pour l'amélioration du processus d'extraction des connaissances à partir de grandes bases de données." Lyon 2, 2004. http://theses.univ-lyon2.fr/documents/lyon2/2004/legrand_g.
Full textNowadays, because of the presence of great data bases, the improvement of the data representation quality is very important. Two types of feature transformation make it possible to extract relevant knowledge starting from data. The feature selection is a process which chooses an optimal feature subset according to a particular criterion and which reduces the feature space by removing nonrelevant feature. This transformation allows the reduction of representation space, the elimination of noise and the elimination of redundancy. We propose a method of feature selection between wrapper and filter approach which uses a method of preferences aggregation. The method of aggregation enables us to obtain a feature subset list sorted by order of relevance thanks to the aggregation of results of a set of short-sighted criterion. The feature construction is a process which discovers missing information in a relation between feature and which increases the feature space by creating additional feature. At the time of the process of feature construction, a set of operators is applied to an existing feature set, leading to the construction of one or more new feature. We propose to build new feature thanks to the discovery of the subjacent structure of data. Indeed, It appears more relevant to us to concentrate on the relations existing between modalities of feature rather than on the relations between feature themselves
Masson, Cyrille. "Contribution au cadre des bases de données inductives : formalisation et évaluation des scénarios d'extraction de connaissances." Lyon, INSA, 2005. http://theses.insa-lyon.fr/publication/2005ISAL0042/these.pdf.
Full textThe success of database technologies has lead to an always increasing mass of collected information in different application fields. Knowledge Discovery in Databases (KDD) aims at going further in the querying processes on such data so as to find in these data some hidden knowledge materialized under the form of patterns. The Inductive Database (IDB) concept is a generalization of the database concept which integrates patterns and data in a common framework. A KDD process can thus be seen as an extended querying process on an IDB. This PhD. Thesis is about the formalization and the evaluation of KDD scenarios in the IDB framework. We first show how to use an abstract language for IDBs to formally describe extraction processes that can be performed by the user. We thus obtain a prototypical scenario, i. E. A theoritical object made of a sequence of inductive queries and on which it is possible to reason. Such a kind of scenario is useful to formalize processes when transfering expertise between final users and KDD experts. Another application of the concept of scenario is the evaluation on a common basis of different implementations of IDBs, similarly to existing benchmarks for databases. An evaluation scenario has the same form than a prototypical scenario, but it focuses more on algorithmic issues and optimization techniques for sequences of inductive queries. When computing an execution plan for such a scenario, the IDB system should analyze the properties of queries composing it, by discovering dependencies between them or conjunctions of constraints for which it is useful to have efficient extraction tools. Finally, we present an evaluation scenario in the field of bioinformatics, and we show how to solve it by using techniques developed in our group or especially designed for the need of this scenario
Mitas̃iūnaite, Ieva. "Mining string data under similarity and soft-frequency constraints : application to promoter sequence analysis." Lyon, INSA, 2009. http://theses.insa-lyon.fr/publication/2009ISAL0036/these.pdf.
Full textAn inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of pattern (i. E. , patterns from different pattern languages). An instance of such a generic solver exists that is capable of mining string patterns from string data sets. In our main application, promoter sequence analysis, there is a requirement to handle fault-tolerance, as the data intrinsically contains errors, and the phenomenon we are trying to capture is fundamentally degenerate. Our research contribution to fault-tolerant pattern extraction in string data sets is the use of a generic solver, based on a non-trivial formalisation of fault-tolerant pattern extraction as a constraint-based mining task. We identified the stages in the process of the extraction of such patterns where state-of-art strategies can be applied to prune the search space. We then developed a fault-tolerant pattern match function InsDels that generic constraint solving strategies can soundly tackle. We also focused on making local patterns actionable. The bottleneck of most local pattern extraction methods is the burden of spurious patterns. As the analysis of patterns by the application domain experts is time consuming, we cannot afford to present patterns without any objective clue about their relevancy. Therefore we have developed two methods of computing the expected number of patterns extracted in random data sets. If the number of extracted patterns is strongly different from the expected number from random data sets, one can then state that the results exhibits local associations that are a priori relevant because they are unexpected. Among others applications, we have applied our approach to support the discovery of new motifs in gene promoter sequences with promising results
Jouve, Pierre-Emmanuel. "Apprentissage non supervisé et extraction de connaissances à partir de données." Lyon 2, 2003. http://theses.univ-lyon2.fr/documents/lyon2/2003/jouve_pe.
Full textMokrane, Abdenour. "Représentation de collections de documents textuels : application à la caractérisation thématique." Montpellier 2, 2006. http://www.theses.fr/2006MON20162.
Full textHuynh, Hiep Xuan. "Interestingness measures for association rules in a KDD process : postprocessing of rules with ARQAT tool." Nantes, 2006. http://www.theses.fr/2006NANT2110.
Full textThis work takes place in the framework of Knowledge Discovery in Databases (KDD), often called "Data Mining". This domain is both a main research topic and an application field in companies. KDD aims at discovering previously unknown and useful knowledge in large databases. In the last decade many researches have been published about association rules, which are frequently used in data mining. Association rules, which are implicative tendencies in data, have the advantage to be an unsupervised model. But, in counter part, they often deliver a large number of rules. As a consequence, a postprocessing task is required by the user to help him understand the results. One way to reduce the number of rules - to validate or to select the most interesting ones - is to use interestingness measures adapted to both his/her goals and the dataset studied. Selecting the right interestingness measures is an open problem in KDD. A lot of measures have been proposed to extract the knowledge from large databases and many authors have introduced the interestingness properties for selecting a suitable measure for a given application. Some measures are adequate for some applications but the others are not. In our thesis, we propose to study the set of interestingness measure available in the literature, in order to evaluate their behavior according to the nature of data and the preferences of the user. The final objective is to guide the user's choice towards the measures best adapted to its needs and in fine to select the most interesting rules. For this purpose, we propose a new approach implemented in a new tool, ARQAT (Association Rule Quality Analysis Tool), in order to facilitate the analysis of the behavior about 40 interestingness measures. In addition to elementary statistics, the tool allows a thorough analysis of the correlations between measures using correlation graphs based on the coefficients suggested by Pearson, Spearman and Kendall. These graphs are also used for identifying the clusters of similar measures. Moreover, we proposed a series of comparative studies on the correlations between interestingness measures on several datasets. We discovered a set of correlations not very sensitive to the nature of the data used, and which we called stable correlations. Finally, 14 graphical and complementary views structured on 5 levels of analysis: ruleset analysis, correlation and clustering analysis, most interesting rules analysis, sensitivity analysis, and comparative analysis are illustrated in order to show the interest of both the exploratory approach and the use of complementary views
Chambefort, Françoise. "Mimèsis du flux, exploration des potentialités narratives des flux de données." Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCC004.
Full textSometimes called stream art or data art, digital art seizes data streams as its raw materials. Choosing a path of creative research, this thesis explores the story-telling potentialities of data streams. Structured around technical, social, semiotic and aesthetic approaches, its thinking draws on various fields of study : information and communication sciences, but also computer sciences, cognitive sciences, philosophy, sociology and narratology. The work Lucette, Gare de Clichy was especially designed to answer the researched question. The conformation of the work allowed for two different versions of it : a screen version and a performance. It is studied in all its stages, from its creation process to the public's response to it. Jonathan Fletcher Moore's installation, Artificial Killing Machine, is also analyzed. First, our object of research - stories made from a real-time data stream - is defined and the concept of data mills is crafted to refer to this type of work. Then four hypothesis are formulated and individually verified. If data mills are to be able to form a narrative representation, they must free themselves from the logic of action. Thus can fiction become powered by reality. The metaphor that links the data originated in reality and the crafted fiction generates in the viewer a shifting of focus between what is compared and what compares. This switching-metaphor has the power to reinforce the meaning it carries. Data mills are therefore able to convey the contingency of life as experienced by a vulnerable individual, tossed back and forth between objective and subjective time
Ouksili, Hanane. "Exploration et interrogation de données RDF intégrant de la connaissance métier." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLV069.
Full textAn increasing number of datasets is published on the Web, expressed in languages proposed by the W3C to describe Web data such as RDF, RDF(S) and OWL. The Web has become a unprecedented source of information available for users and applications, but the meaningful usage of this information source is still a challenge. Querying these data sources requires the knowledge of a formal query language such as SPARQL, but it mainly suffers from the lack of knowledge about the source itself, which is required in order to target the resources and properties relevant for the specific needs of the application. The work described in this thesis addresses the exploration of RDF data sources. This exploration is done according to two complementary ways: discovering the themes or topics representing the content of the data source, and providing a support for an alternative way of querying the data sources by using keywords instead of a query formulated in SPARQL. The proposed exploration approach combines two complementary strategies: thematic-based exploration and keyword search. Theme discovery from an RDF dataset consists in identifying a set of sub-graphs which are not necessarily disjoints, and such that each one represents a set of semantically related resources representing a theme according to the point of view of the user. These themes can be used to enable a thematic exploration of the data source where users can target the relevant theme and limit their exploration to the resources composing this theme. Keyword search is a simple and intuitive way of querying data sources. In the case of RDF datasets, this search raises several problems, such as indexing graph elements, identifying the relevant graph fragments for a specific query, aggregating these relevant fragments to build the query results, and the ranking of these results. In our work, we address these different problems and we propose an approach which takes as input a keyword query and provides a list of sub-graphs, each one representing a candidate result for the query. These sub-graphs are ordered according to their relevance to the query. For both keyword search and theme identification in RDF data sources, we have taken into account some external knowledge in order to capture the users needs, or to bridge the gap between the concepts invoked in a query and the ones of the data source. This external knowledge could be domain knowledge allowing to refine the user's need expressed by a query, or to refine the definition of themes. In our work, we have proposed a formalization to this external knowledge and we have introduced the notion of pattern to this end. These patterns represent equivalences between properties and paths in the dataset. They are evaluated and integrated in the exploration process to improve the quality of the result
Karoui, Lobna. "Extraction contextuelle d'ontologie par fouille de données." Paris 11, 2008. http://www.theses.fr/2008PA112220.
Full textBen, Messaoud Riadh. "Couplage de l'analyse en ligne et de la fouille de données pour l'exploration, l'agrégation et l'explication des données complexes." Lyon 2, 2006. http://theses.univ-lyon2.fr/documents/lyon2/2006/benmessaoud_r.
Full textData warehouses provide efficient solutions for the management of huge amounts of data. Online analytical processing (OLAP) is a key feature in data warehouses which enables users with visual tools to explore data cubes. Therefore, users are capable to extract relevant information for their decision-making. On the other hand, data mining offers automatic learning techniques in order to come out with comprehensive knowledge covering descriptions, clusterings and explanations. The idea of combining online analytical processing and data mining is a promising solution to improve the decision-making process, especially in the case of complex data. In fact, OLAP and data mining could be two complementary fields that interact together within a unique analysis process. The aim of this thesis is to propose new approaches for decision support based on coupling online analytical processing and data mining. In order to do so, we have established three main proposals. The first one concerns the visualization of sparse data. According to the multiple correspondence analysis, we have reduced the negative effect of sparsity by reorganizing the cells of a data cube. Our second proposal provides a new aggregation of facts in a data cube by using agglomerative hierarchical clustering. The obtained aggregates are semantically richer than those provided by traditional multidimensional structures. Our third proposal tries to explain possible relationships within multidimensional data by using association rules. We have designed a new algorithm for a guided-mining of association rules in data cubes. We have also developed a software platform which includes our theoretical contributions. In addition, we provided a case study on complex data in order to validate our approaches. Finally, based on an OLAP algebra, we have designed the first principles toward a general formal framework which models the problem of coupling online analytical processing and data mining
Le, Corre Laure. "Données actuelles de l'exploration fonctionnelle thyroi͏̈dienne." Paris 5, 1992. http://www.theses.fr/1992PA05P133.
Full textCharantonis, Anastase Alexandre. "Méthodologie d'inversion de données océaniques de surface pour la reconstitution de profils verticaux en utilisant des chaînes de Markov cachées et des cartes auto-organisatrices." Paris 6, 2013. http://www.theses.fr/2013PA066761.
Full textSatellite observations provide us with the values of different biogeochemical parameters at the surface layer of the ocean. These observations are highly correlated with the underlying vertical profiles of different oceanic parameters, such as the Chlorophyll-A concentration, the salinity and temperature of the water column… The sea-surface data and the vertical profiles of the oceanic parameters constitute multi-dimensional vectors. Due to their multi-dimensionality and the high complexity of the dynamics connecting these data sets, their links cannot be modeled linearly. In this thesis we present a methodology to statistically invert sea-surface observations in order to retrieve these vertical profiles. The developed methodology, named PROFHMM, makes use of Self Organizing Maps in order to render the inversion problem compatible with the Hidden Markov Model formalism. PROFHMM makes full use of the topological aspect of the Self Organizing Maps in order not only to generate the topology and states of the Hidden Markov Model, but also improve the estimation of the probabilities essential to the accuracy of the model. The use of the Self Organizing maps was essential in obtaining the results for the geophysical applications of PROFHMM presented in this manuscript. The manuscript was structured in three chapters, each consisting of an article. In the first one, the general methodology of PROFHMM is developed, then tested for the retrieval of vertical profiles of Chlorophyll-A by inverting sea-surface observations. This application demonstrated the ability to synchronize sea-surface data with the output data of numerical models. The second article presents the application of PROFHMM on the inversion of sea-surface data obtained from the AVISO and NOAA projects, in order to retrieve the vertical profiles of temperature over the rail of the ARAMIS mission. The performances obtained demonstrate the ability of PROFHMM to synchronize sea-surface data with in-situ measurements. Finally, in the third article, we present a modification to the Viterbi Algorithm in order to take into account an à priori knowledge of the quality of the observations when performing reconstructions. The proposed methodology, named PROFHMM_UNC, was applied for the reconstruction of the temporal evolution of sea-surface data, by taking into account the quality of the satellite observations used. The validity of the method was proven by performing a twin experiment on the outputs of a numerical model
Rannou, Éric. "Modélisation explicative de connaissances à partir de données." Toulouse 3, 1998. http://www.theses.fr/1998TOU30290.
Full textAboa, Yapo Jean-Pascal. "Méthodes de segmentation sur un tableau de variables aléatoires." Paris 9, 2002. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2002PA090042.
Full textBlachon, Sylvain. "Exploration des données SAGE par des techniques de fouille de données en vue d'extraire des groupes de synexpression impliqués dans l'oncogénèse." Lyon, INSA, 2007. http://theses.insa-lyon.fr/publication/2007ISAL0034/these.pdf.
Full textWith the development of high-throughput molecular biology techniques, the accumulation of huge quantities of data asks new methodological and theoretical questions, in biology and in computer science. These questions open the field of study of life complexity. This work is a part of this bioinformatics framework. Essentially, our contribution resides in the study and query of human SAGE data from the Cancer Genome Anatomy Project. We studied deeply the specifie qualities of these data, and the biological questions we can ask on these data. To answer these, several methods of data mining were needed. Each question demanded the conception of an original data mining scenario. Their setting-up was based on the use of several data mining algorithms dedicatted to the extraction of local set patterns in database, especially the ones developed by the partners involved in a French national project, the ACI BINGO. The biological questions and the particular shape of SAGE data confronted us to various technological issues that are now fixed or at least delimited. A special effort was made to post-process the extracted local patterns and to interpret them. As a matter of fact, a clustering method to aggregate similar local patterns was proposed to ease the identification of relevant patterns from a biologist point of view. The impact of all these methodological elements was validated on a work of interpretation of QSGs in order to propose new hypotheses on sets of genes simultaneously over-expressed in cancerous situations
Méger, Nicolas. "Recherche automatique des fenêtres temporelles optimales des motifs séquentiels." Lyon, INSA, 2004. http://theses.insa-lyon.fr/publication/2004ISAL0095/these.pdf.
Full textThis work addresses the problem of mining patterns under constraints in event sequences. Extracted patterns are episode rules. Our main contribution is an automatic search for optimal time window of each one of the episode rules. We propose to extract only rules having such an optimal time window. These rules are termed FLM-rules. We present an algorithm, WinMiner, that aims to extract FLM-rules, given a minimum support threshold, a minimum confidence threshold and a maximum gap constraint. Proofs of the correctness of this algorithm are supplied. We also propose a dedicated interest measure that aims to select FLM-rules such that their heads and bodies can be considered as dependant. Two applications are described. The first one is about mining medical datasets while the other one deals with seismic datasets
Bykowski, Artur. "Condensed representations of frequent sets : application to descriptive pattern discovery." Lyon, INSA, 2002. http://theses.insa-lyon.fr/publication/2002ISAL0053/these.pdf.
Full textInteresting pattern discovery has recently seen an impressive progress, due to an increasing pressure from owners of large data sets and to the response of scientists by numerous theoretical and practical results. The most of data sets addressed in the beginning of the surge were sales data and the interesting patterns were in form of association rules. Very efficient solutions to this practical problem were elaborated, the root of them was the so-called APRIORI algorithm. Then, the owners of other types of data wondered if these basic methods could help them. Unfortunately, their data were different. Often, these applications could not take advantage of APRIORI. The research following the elaboration of the basic solution addressed the important application areas, where the basic solution could not be used. We addressed the problems with mining frequent patterns in different applicative contexts, especially the problems related to the large number of interesting frequent patterns present in data that are not similar to the sales data. Our methods mine a collection of patterns that may be quite different from the target pattern collection, and hopefully much more efficient to be mined in some types of data. Moreover, that different pattern collection must allow a subsequent "regeneration" of the target collection in a very efficient manner. Since the intermediate representation will be often smaller than the target collection, we call it a condensed representation. We obtained a significant improvement of the performances. The use of condensed representations is relatively novel in the field. Then new major condensed representations of simple frequent patterns are proposed, the algorithms to mine them and derive the target pattern collections. We show the practical advantages of the proposed condensed representations over the past methods, and provide an abstract view of the proposed representations in the unified structure for condensed representations
Jollois, François-Xavier. "Contribution de la classification automatique à la fouille de données." Metz, 2003. http://docnum.univ-lorraine.fr/public/UPV-M/Theses/2003/Jollois.Francois_Xavier.SMZ0311.pdf.
Full textCouturier, Olivier. "Contribution à la fouille de données : règles d'association et interactivité au sein d'un processus d'extraction de connaissances dans les données." Artois, 2005. http://www.theses.fr/2005ARTO0410.
Full textBerasaluce, Sandra. "Fouille de données et acquisition de connaissances à partir de bases de données de réactions chimiques." Nancy 1, 2002. http://docnum.univ-lorraine.fr/public/SCD_T_2002_0266_BERASALUCE.pdf.
Full textChemical reaction database, indispensable tools for synthetic chemists, are not free from flaws. In this thesis, we have tried to overcome the databases limits by adding knowledge which structures data. This allows us to consider new efficient modes for query these databases. In the end, the goal is to design systems having both functionalities of DB and KBS. In the knowledge acquisition process, we emphasized on the modelling of chemical objects. Thus, we were interested in synthetic methods which we have described in terms of synthetic objectives. Afterward, we based ourselves on the elaborated model to apply data mining techniques and to extract knowledge from chemical reaction databases. The experiments we have done on Resyn Assistant concerned the synthetic methods which construct monocycles and the functional interchanges and gave trends in good agreement with the domain knowledge
Merroun, Omar. "Traitement à grand échelle des données symboliques." Paris 9, 2011. http://www.theses.fr/2011PA090027.
Full textSymbolic Data Analysis (SDA) proposes a generalization of classical Data Analysis (AD) methods using complex data (intervals, sets, histograms). These methods define high level and complex operators for symbolic data manipulation. Furthermore, recent implementations of the SDA model are not able to process large data volumes. According to the classical design of massive data computation, we define a new data model to represent and process symbolic data using algebraic operators that are minimal and closed by composition. We give some query samples to emphasize the expressiveness of our model. We implement this algebraic model, called LS-SODAS, and we define the language XSDQL to express queries for symbolic data manipulation. Two cases of study are provided in order to show the potential of XSDQL langage expressiveness and the data processing scalability
Favre, Cécile. "Evolution de schémas dans les entrepôts de données : mise à jour de hiérarchies de dimension pour la personnalisation des analyses." Lyon 2, 2007. http://theses.univ-lyon2.fr/documents/lyon2/2007/favre_c.
Full textIn this thesis, we propose a solution to personalize analyses in data warehousing. This solution is based on schema evolution driven by users. More precisely, it consists in users’ knowledge and integrating it in the data warehouse to build new analysis axes. To achieve that, we propose an evolving rule-based data warehouse formal model. The rules are named aggregation rules. To exploit this model, we propose an architecture that allows the personalization process. This architecture includes four modules: users’ knowledge acquisition under the form of if-then rules, integration of these rules in the data warehouse; schema evolution; on-line analysis on the new schema. To realize this architecture, we propose an executive model in the relational context to deal with the process of the global architecture. Besides we interested in the evaluation of our evolving model. To do that, we propose an incremental updating method of a given workload in response to the data warehouse schema evolution. To validate our proposals, we developed the WEDriK (data Warehouse Evolution Driven by Knowledge) platform. The problems evoked in this thesis come from the reality of the LCL bank
Cerf, Loïc. "Constraint-based mining of closed patterns in noisy n-ary relations." Lyon, INSA, 2010. http://theses.insa-lyon.fr/publication/2010ISAL0050/these.pdf.
Full textLes processus de découverte de connaissances nouvelles peuvent être fondés sur des motifs locaux extraits de grands jeux de données. Concevoir des algorithmes de fouille de données efficaces pour calculer des collections de motifs pertinents est un domaine actif de recherche. Beaucoup de jeux de données enregistrent si des objets présentent ou non certaines propriétés; par exemple si un produit est acheté par un client ou si un gène est sur exprimé dans un échantillon biologique. Ces jeux de données sont des relations binaires et peuvent être représentés par des matrices 0/1. Dans de telles matrices, un ensemble fermé est un rectangle maximal de '1's modulo des permutations arbitraires des lignes (objets) et des colonnes (propriétés). Ainsi, chaque ensemble fermé sous tend la découverte d'un sous ensemble maximal d'objets partageant le même sous ensemble maximal de propriétés. L'extraction efficace de tous les ensembles fermés, satisfaisant des contraintes de pertinences définies par l'utilisateur, a été étudiée en profondeur. Malgré son succès dans de nombreux domaines applicatifs, ce cadre de travail se révèle souvent trop étroit. Tout d'abord, beaucoup de jeux de données sont des relations n-aires, c'est à dire des tenseurs 0/1. Réduire leur analyse à deux dimensions revient à ignorer des dimensions additionnelles potentiellement intéressantes; par exemple où un client achète un produit (analyse spatiale) ou quand l'expression d'un gène est mesurée (analyse cinétique). La présence de bruit dans la plupart des jeux de données réelles est un second problème qui conduit à la fragmentation des motifs à découvrir. On généralise facilement la définition d'un ensemble fermé pour la rendre applicable à des relations de plus grande arité et tolérante au bruit (hyper rectangle maximal avec une borne supérieure de '0's tolérés par hyperplan). Au contraire, généraliser leur extraction est très difficile. En effet, les algorithmes classiques exploitent une propriété mathématique (la connexion de Galois) des ensembles fermés qu'aucune des deux généralisations ne préserve. C'est pourquoi notre extracteur parcourt l'espace des motifs candidats d'une façon originale qui ne favorise aucune dimension. Cette recherche peut être guidée par une très grande classe de contraintes de pertinence que les motifs doivent satisfaire. En particulier, cette thèse étudie des contraintes spécifiquement conçues pour la fouille de quasi cliques presque persistantes dans des graphes dynamiques. Notre extracteur est plusieurs ordres de grandeurs plus efficaces que les algorithmes existants se restreignant à la fouille de motifs exacts dans des relations ternaires ou à la fouille de motifs tolérants aux erreurs dans des relations binaires. Malgré ces résultats, une telle approche exhaustive ne peut souvent pas, en un temps raisonnable, tolérer tout le bruit contenu dans le jeu de données. Dans ce cas, compléter l'extraction avec une agglomération hiérarchique des motifs (qui ne tolèrent pas suffisamment de bruit) améliore la qualité des collections de motifs renvoyées
Tanasa, Doru. "Web usage mining : contributions to intersites logs preprocessing and sequential pattern extraction with low support." Nice, 2005. http://www.theses.fr/2005NICE4019.
Full textThe Web use mining (WUM) is a rather research field and it corresponds to the process of knowledge discovery from databases (KDD) applied to the Web usage data. It comprises three main stages : the pre-processing of raw data, the discovery of schemas and the analysis (or interpretation) of results. The quantity of the web usage data to be analysed and its low quality (in particular the absence of structure) are the principal problems in WUM. When applied to these data, the classic algorithms of data mining, generally, give disappointing results in terms of behaviours of the Web sites users (E. G. Obvious sequential patterns, stripped of interest). In this thesis, we bring two significant contributions for a WUM process, both implemented in our toolbox, the Axislogminer. First, we propose a complete methodology for pre-processing the Web logs whose originality consists in its intersites aspect. We propose in our methodology four distinct steps : the data fusion, data cleaning, data structuration and data summarization. Our second contribution aims at discovering from a large pre-processed log file the minority behaviours corresponding to the sequential patterns with low support. For that, we propose a general methodology aiming at dividing the pre-processed log file into a series of sub-logs. Based on this methodology, we designed three approaches for extracting sequential patterns with low support (the sequential, iterative and hierarchical approaches). These approaches we implemented in hybrid concrete methods using algorithms of clustering and sequential pattern mining
Berri, Jawad Abdelfettah. "Contribution à la méthode d'exploration contextuelle : applications au résumé automatique et aux représentations temporelles réalisation informatique du système SERAPHIN." Paris 4, 1996. http://www.theses.fr/1996PA040041.
Full textNovelli, Noël. "Extraction de dépendances fonctionnetitre : Une approche Data Mining." Aix-Marseille 2, 2000. http://www.theses.fr/2000AIX22071.
Full textHurter, Christophe. "Caractérisation de visualisations et exploration interactive de grandes quantités de données multidimensionnelles." Phd thesis, Université Paul Sabatier - Toulouse III, 2010. http://tel.archives-ouvertes.fr/tel-00610623.
Full text