Teses / dissertações sobre o tema "Bruit des ensembles de données"
Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos
Veja os 50 melhores trabalhos (teses / dissertações) para estudos sobre o assunto "Bruit des ensembles de données".
Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.
Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.
Veja as teses / dissertações das mais diversas áreas científicas e compile uma bibliografia correta.
Al, Jurdi Wissam. "Towards next generation recommender systems through generic data quality". Electronic Thesis or Diss., Bourgogne Franche-Comté, 2024. http://www.theses.fr/2024UBFCD005.
Texto completo da fonteRecommender systems are essential for filtering online information and delivering personalized content, thereby reducing the effort users need to find relevant information. They can be content-based, collaborative, or hybrid, each with a unique recommendation approach. These systems are crucial in various fields, including e-commerce, where they help customers find pertinent products, enhancing user experience and increasing sales. A significant aspect of these systems is the concept of unexpectedness, which involves discovering new and surprising items. This feature, while improving user engagement and experience, is complex and subjective, requiring a deep understanding of serendipitous recommendations for its measurement and optimization. Natural noise, an unpredictable data variation, can influence serendipity in recommender systems. It can introduce diversity and unexpectedness in recommendations, leading to pleasant surprises. However, it can also reduce recommendation relevance, causing user frustration. Therefore, it is crucial to design systems that balance natural noise and serendipity. Inconsistent user information due to natural noise can negatively impact recommender systems, leading to lower-quality recommendations. Current evaluation methods often overlook critical user-oriented factors, making noise detection a challenge. To provide powerful recommendations, it’s important to consider diverse user profiles, eliminate noise in datasets, and effectively present users with relevant content from vast data catalogs. This thesis emphasizes the role of serendipity in enhancing recommender systems and preventing filter bubbles. It proposes serendipity-aware techniques to manage noise, identifies algorithm flaws, suggests a user-centric evaluation method, and proposes a community-based architecture for improved performance. It highlights the need for a system that balances serendipity and considers natural noise and other performance factors. The objectives, experiments, and tests aim to refine recommender systems and offer a versatile assessment approach
Durand, Marianne. "Combinatoire analytique et algorithmique des ensembles de données". Phd thesis, Ecole Polytechnique X, 2004. http://pastel.archives-ouvertes.fr/pastel-00000810.
Texto completo da fontePont, Mathieu. "Analysis of Ensembles of Topological Descriptors". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS436.
Texto completo da fonteTopological Data Analysis (TDA) forms a collection of tools to generically, robustly and efficiently reveal implicit structural patterns hidden in complex datasets. These tools allow to compute a topological representation for each member of an ensemble of datasets by encoding its main features of interest in a concise and informative manner. A major challenge consists then in designing analysis tools for such ensembles of topological descriptors. Several tools have been well studied for persistence diagrams, one of the most used descriptor. However, they suffer from a lack of specificity, which can yield identical data representations for significantly distinct datasets. In this thesis, we aimed at developing more advanced analysis tools for ensembles of topological descriptors, capable of tackling the lack of discriminability of persistence diagrams and going beyond what was already available for these objects. First, we adapt to merge trees, descriptors having a better specificity, the tools already available for persistence diagrams such as distances, geodesics and barycenters. Then, we want to go beyond this notion of average being the barycenter in order to study the variability within an ensemble of topological descriptors. We then adapt the Principal Component Analysis framework to persistence diagrams and merge trees, resulting in a dimensionality reduction method that indicates which structures in the ensemble are most responsible for the variability. However, this framework allows only to detect linear patterns of variability in the ensemble. To tackle this we propose to generalize this framework to Auto-Encoder in order to detect non-linear, i.e. more complex, patterns in an ensemble of merge trees or persistence diagrams. Specifically, we propose a new neural network layer capable of processing natively these objects. We present applications of all this work in feature tracking in a time-varying ensemble, data reduction to compress an ensemble of topological descriptors, clustering to form homogeneous groups in an ensemble, and dimensionality reduction to create a visual map indicating how the data are organized regarding each other in the ensemble
Boudjeloud-Assala, Baya Lydia. "Visualisation et algorithmes génétiques pour la fouille de grands ensembles de données". Nantes, 2005. http://www.theses.fr/2005NANT2065.
Texto completo da fonteWe present cooperative approaches using interactive visualization methods and automatic dimension selection methods for knowledge discovery in databases. Most existing data mining methods work in an automatic way, the user is not implied in the process. We try to involve more significantly the user role in the data mining process in order to improve his confidence and comprehensibility of the obtained models or results. Furthermore, the size of data sets is constantly increasing, these methods must be able to deal with large data sets. We try to improve the performances of the algorithms to deal with these high dimensional data sets. We developed a genetic algorithm for dimension selection with a distance-based fitness function for outlier detection in high dimensional data sets. This algorithm uses only a few dimensions to find the same outliers as in the whole data sets and can easily treat high dimensional data sets. The number of dimensions used being low enough, it is also possible to use visualization methods to explain and interpret outlier detection algorithm results. It is then possible to create a model from the data expert for example to qualify the detected element as an outlier or simply an error. We have also developed an evaluation measure for dimension selection in unsupervised classification and outlier detection. This measure enables us to find the same clusters as in the data set with its whole dimensions as well as clusters containing very few elements (outliers). Visual interpretation of the results shows the dimensions implied, they are considered as relevant and interesting for the clustering and outlier detection. Finally we present a semi-interactive genetic algorithm involving more significantly the user in the selection and evaluation process of the algorithm
Gueunet, Charles. "Calcul haute performance pour l'analyse topologique de données par ensembles de niveaux". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS120.
Texto completo da fonteTopological Data Analysis requires efficient algorithms to deal with the continuously increasing size and level of details of data sets. In this manuscript, we focus on three fundamental topological abstractions based on level sets: merge trees, contour trees and Reeb graphs. We propose three new efficient parallel algorithms for the computation of these abstractions on multi-core shared memory workstations. The first algorithm developed in the context of this thesis is based on multi-thread parallelism for the contour tree computation. A second algorithm revisits the reference sequential algorithm to compute this abstraction and is based on local propagations expressible as parallel tasks. This new algorithm is in practice twice faster in sequential than the reference algorithm designed in 2000 and offers one order of magnitude speedups in parallel. A last algorithm also relying on task-based local propagations is presented, computing a more generic abstraction: the Reeb graph. Contrary to concurrent approaches, these methods provide the augmented version of these structures, hence enabling the full extend of level-set based analysis. Algorithms presented in this manuscript result today in the fastest implementations available to compute these abstractions. This work has been integrated into the open-source platform: the Topology Toolkit (TTK)
Ndiaye, Marie. "Exploration de grands ensembles de motifs". Thesis, Tours, 2010. http://www.theses.fr/2010TOUR4029/document.
Texto completo da fonteThe abundance of patterns generated by knowledge extraction algorithms is a major problem in data mining. Ta facilitate the exploration of these patterns, two approaches are often used: the first is to summarize the sets of extracted patterns and the second approach relies on the construction of visual representations of the patterns. However, the summaries are not structured and they are proposed without exploration method. Furthermore, visualizations do not provide an overview of the pattern .sets. We define a generic framework that combines the advantages of bath approaches. It allows building summaries of patterns sets at different levels of detail. These summaries provide an overview of the pattern sets and they are structured in the form of cubes on which OLAP navigational operators can be applied in order to explore the pattern sets. Moreover, we propose an algorithm which provides a summary of good quality whose size is below a given threshold. Finally, we instantiate our framework with association rules
Ould, Yahia Sabiha. "Interrogation multi-critères d'une base de données spatio-temporelles". Troyes, 2005. http://www.theses.fr/2005TROY0006.
Texto completo da fonteThe study of the human behavior in driving situations is of primary importance for the improvement of drivers security. This study is complex because of the numerous situations in which the driver may be involved. The objective of the CASSICE project (Symbolic Characterization of Driving Situations) is to elaborate a tool in order to simplify the analysis task of the driver's behavior. In this paper, we will mainly take an interest in the indexation and querying of a multimedia database including the numerical data and the video sequences relating to a type of driving situations. We will put the emphasis on the queries to this database. They are often complex because they are formulated according to criteria depending on time, space and they use terms of the natural language
Guerra, Thierry-Marie. "Analyse de données objectivo-subjectives : Approche par la théorie des sous-ensembles flous". Valenciennes, 1991. https://ged.uphf.fr/nuxeo/site/esupversions/a3f55508-7363-49a4-a531-9d723ff55359.
Texto completo da fonteDahabiah, Anas. "Extraction de connaissances et indexation de données multimédia pour la détection anticipée d'événements indésirables". Télécom Bretagne, 2010. http://www.theses.fr/2010TELB0117.
Texto completo da fonteSimilarity measuring is the essential quoin of the majority of data mining techniques and tasks in which information elements can take any type (quantities, qualitative, binary, ordinal, etc. ) and may be affected with various forms of imperfection (uncertainty, imprecision, ambiguity, etc. ). Additionally, the points of view of the experts and data owners must sometimes be considered and integrated even if presented in ambiguous or imprecise manners. Nonetheless, all the existing methods and approaches have partially handled some aspects of the aforementioned points disregarding the others. In reality, the heterogeneity, the imperfection, and the personalization have been separately conducted in prior works, using some constraints and assumptions that can overburden the procedure, limit their applications, and increase its computing time which is a crucial issue in data mining. In this thesis, we propose a novel approach essentially based on possibility theory to deal with all the aforementioned aspects within a unified general integrated framework. In order to get deeper insight and understanding of the information elements, the possibilistic modeling has been materialized via spatial, graphical and structural representations and applied to several data mining tasks using a medical database
Raschia, Guillaume. "SaintEtiq : une approche floue pour la génération de résumés à partir de bases de données relationnelles". Nantes, 2001. http://www.theses.fr/2001NANT2099.
Texto completo da fonteKaliky, Pierre-Yves. "Etude des modèles de bruit impulsif dans les transmissions de données : Application à un modem numérique utilisant une modulation de phase octavalente". Nancy 1, 1991. http://www.theses.fr/1991NAN10416.
Texto completo da fonteRenard, François. "Inversion de données sismiques : prise en compte de la nature corrélée du bruit". Montpellier 2, 2003. http://www.theses.fr/2003MON20014.
Texto completo da fonteVoglozin, W. Amenel Abraham. "Le résumé linguistique de données structurées comme support pour l'interrogation". Nantes, 2007. http://www.theses.fr/2007NANT2040.
Texto completo da fonteSaint-Paul, Régis. "Une architecture pour le résumé en ligne de données relationnelles et ses applications". Nantes, 2005. http://www.theses.fr/2005NANT2029.
Texto completo da fonteThis work is intended to provide some contribution in two research area: large database summarization through fuzzy set-based techniques and the application perspectives offered by the produced summaries. The summarization process is based on Zadeh's fuzzy set theory which offers a strong theoretical model for the representation of uncertain or non-precise data, especially through the possibilistic extension of the relational database model. Produced summaries exhibit a description of subsets of the original database at different granularity levels. The process is designed to be able to incrementally take into account the update operations that are performed on the summarized database. Its message-oriented architecture, based on Web-Services, allows the process to optimize memory consumption as well as processing cost. This open architecture is also designed to facilitate the summarization system integration within existing database management systems. Tests, performed on very large datasets, confirmed the process scalability and its linear time complexity. Applications in decision making as well as multimedia databases, based on real-life datasets, also confirm the practical usefulness of the produced summaries
Azé, Jérôme. "Extraction de Connaissances à partir de Données Numériques et Textuelles". Phd thesis, Université Paris Sud - Paris XI, 2003. http://tel.archives-ouvertes.fr/tel-00011196.
Texto completo da fonteL'analyse de telles données est souvent contrainte par la définition d'un support minimal utilisé pour filtrer les connaissances non intéressantes.
Les experts des données ont souvent des difficultés pour déterminer ce support.
Nous avons proposé une méthode permettant de ne pas fixer un support minimal et fondée sur l'utilisation de mesures de qualité.
Nous nous sommes focalisés sur l'extraction de connaissances de la forme "règles d'association".
Ces règles doivent vérifier un ou plusieurs critères de qualité pour être considérées comme intéressantes et proposées à l'expert.
Nous avons proposé deux mesures de qualité combinant différents critères et permettant d'extraire des règles intéressantes.
Nous avons ainsi pu proposer un algorithme permettant d'extraire ces règles sans utiliser la contrainte du support minimal.
Le comportement de notre algorithme a été étudié en présence de données bruitées et nous avons pu mettre en évidence la difficulté d'extraire automatiquement des connaissances fiables à partir de données bruitées.
Une des solutions que nous avons proposée consiste à évaluer la résistance au bruit de chaque règle et d'en informer l'expert lors de l'analyse et de la validation des connaissances obtenues.
Enfin, une étude sur des données réelles a été effectuée dans le cadre d'un processus de fouille de textes.
Les connaissances recherchées dans ces textes sont des règles d'association entre des concepts définis par l'expert et propres au domaine étudié.
Nous avons proposé un outil permettant d'extraire les connaissances et d'assister l'expert lors de la validation de celles-ci.
Les différents résultats obtenus montrent qu'il est possible d'obtenir des connaissances intéressantes à partir de données textuelles en minimisant la sollicitation de l'expert dans la phase d'extraction des règles d'association.
Thomopoulos, Rallou. "Représentation et interrogation élargie de données imprécises et faiblement structurées". Paris, Institut national d'agronomie de Paris Grignon, 2003. http://www.theses.fr/2003INAP0018.
Texto completo da fonteThis work is part of a project applied to predictive microbiology, which is built on a database and on its querying system. The data used in the project are weakly structured, they may be imprecise, and cannot provide exact answers to every query, so that a flexible querying system is necessary for the querying of the database. We use the conceptual graph model in order to take into account weakly structured data, and fuzzy set theory, in order to represent imprecise data and fuzzy queries. The purpose of this work is to provide a combination of these two formalisms
Spill, Yannick. "Développement de méthodes d'échantillonnage et traitement bayésien de données continues : nouvelle méthode d'échange de répliques et modélisation de données SAXS". Paris 7, 2013. http://www.theses.fr/2013PA077237.
Texto completo da fonteThe determination of protein structures and other macromolecular complexes is becoming more and more difficult. The simplest cases have already been determined, and today's research in structural bioinformatics focuses on ever more challenging targets. To successfully determine the structure of these complexes, it has become necessary to combine several kinds of experiments and to relax the quality standards during acquisition. In other words, structure determination makes an increasing use of sparse, noisy and inconsistent data. It is therefore becoming essential to quantify the accuracy of a determined structure. This quantification is superbly achieved by statistical inference. In this thesis, I develop a new sampling algorithm, Convective Replica-Exchange, sought to find probable structures more robustly. I also propose e proper statistical treatment for continuous data, such as Small-Angle X-Ray Scattering data
Alilaouar, Abdeslame. "Contribution à l'interrogation flexible de données semi-structurées". Toulouse 3, 2007. http://thesesups.ups-tlse.fr/90/.
Texto completo da fonteMany querying languages have been proposed to manipulate Semi-Structured Data (SSD) and to extract relevant information (in terms of structure and/or content) to the user. Such querying languages should take into account not only the content but also the underlying structure since it can completely change their relevance and adequacy with respect to the needs expressed by the user. However, not having prior knowledge and the heterogeneity of SSD structure makes classical database languages inadequate. The work undertaken on database flexible querying revealed that fuzzy logic is particularly well-suited for modelling the notion of flexibility and preferences according to human reasoning. In this sense, we propose a model of flexible query for SSD in general and XML documents, taking into account the content and the underlying structure of SSD. Fuzzy logic is used to represent the user's preferences on the content and structure of SSD. At the end of the evaluation process, every response is associated with a degree in the interval ]0. 1]. The more this degree is low, the answer seems less relevant. This degree is calculated using the degree of ownership and measures known similarity in information retrieval systems for content, and the minimum spanning tree for the structure. The proposed model has been reviewed and validated using PRETI Platform and INEX benchmark, thanks to the prototype that we've developped
Hebert, Pierre-Alexandre. "Analyse de données sensorielles : une approche ordinale floue". Compiègne, 2004. http://www.theses.fr/2004COMP1542.
Texto completo da fonteSensory profile data aims at describing the sensory perceptions of human subjects. Such a data is composed of scores attributed by human sensory experts (or judges) in order to describe a set of products according to sensory descriptors. AlI assessments are repeated, usually three times. The thesis describes a new analysis method based on a fuzzy modelling of the scores. The first step of the method consists in extracting and encoding the relevant information of each replicate into a fuzzy weak dominance relation. Then an aggregation procedure over the replicates allows to synthesize the perception of each judge into a new fuzzy relation. Ln a similar way, a consensual relation is finally obtained for each descriptor by fusing the relations of the judges. So as to ensure the interpretation of fused relations, fuzzy preference theory is used. A set of graphical tools is then proposed for the mono and multidimensional analysis of the obtained relations
Longueville, Véronique. "Modélisation, calcul et évaluation de liens pour la navigation dans les grands ensembles d'images fixes". Toulouse 3, 1993. http://www.theses.fr/1993TOU30149.
Texto completo da fonteMakhalova, Tatiana. "Contributions à la fouille d'ensembles de motifs : des données complexes à des ensembles de motifs signifiants et réutilisables". Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0124.
Texto completo da fonteIn this thesis, we study different aspects of pattern mining in binary and numerical tabular datasets. The objective of pattern mining is to discover a small set of non-redundant patterns that may cover entirely a given dataset and be interpreted as useful and significant knowledge units. We focus on some key issues such as (i) formal definition of pattern interestingness, (ii) the minimization of pattern explosion, (iii) measure for evaluating the performance of pattern mining, and (iv) the discrepancy between interestingness and quality of the discovered pattern sets. Moreover, we go beyond the typical perspectives of pattern mining and investigate the intrinsic structure underlying a tabular dataset. The main contributions of this research work are theoretical, conceptual, and practical. Regarding the theoretical novelty, we propose a so-called closure structure and the GDPM algorithm for its computing. The closure structure allows us to estimate both the data and pattern complexity. Furthermore, practically the closure structure may be used to represent the data topology w.r.t. an interestingness measure. Conceptually, the closure structure allows an analyst to understand the intrinsic data configuration before selecting any interestingness measure rather than to understand the data by means of an arbitrarily selected interestingness measure. In this research work, we also discuss the difference between interestingness and quality of pattern sets. We propose to adopt the best practices of supervised learning in pattern mining. Based on that, we developed an algorithm for itemset mining, called KeepItSimple, which relates interestingness and the quality of pattern sets. In practice, KeepItSimple allows us to efficiently mine a set of interesting and good-quality patterns without any pattern explosion. In addition, we propose an algorithm for a greedy enumeration of likely-occurring itemsets that can be used when frequent closed itemset miners return too many itemsets. The last practical contribution consists in developing an MDL-based algorithm called Mint for mining pattern sets in numerical data. The Mint algorithm relies on a strong theoretical foundation and at the same time has a practical objective in returning a small set of numerical, non-redundant, and informative patterns. The experiments show that Mint has very good behavior in practice and usually outperforms its competitors
Toutain, Matthieu. "EdP géometriques pour le traitement et la classification de données sur graphes". Caen, 2015. https://hal.archives-ouvertes.fr/tel-01258738.
Texto completo da fontePartial differential equations (PDEs) play a key role in the mathematical modelization of phenomena in applied sciences. In particular, in image processing and computer vision, geometric PDEs have been successfuly used to solve many problems, such as image restoration, segmentation, inpainting, etc. Nowadays, more and more data are collected as graphs or networks, or functions defined on these networks. Knowing this, there is an interest to extend PDEs to process irregular data or graphs of arbitrary topology. The presented work follows this idea. More precisely, this work is about geometric partial difference equations (PdEs) for data processing and classification on graphs. In the first part, we propose a transcription of the normalized SpSLaplacian on weighted graphs of arbitrary topology by using the framework of PdEs. This adaptation allows us to introduce a new class of SpSaplacian on graphs as a non-divergence form. In this part, we also introduce a formulation of the SpSaplacian on graphs defined as a convex combination of gradient terms. We show that this formulation unifies and generalize many existing difference operators defined on graphs. Then, we use this operator with the Poisson equation to compute generalized distances on graphs. In the second part, we propose to apply the operators on graphs we defined, for the tasks of semi-supervised classification and clustering. We compare them to existing graphs operators and to some of the state of the art methods, such as Multiclass Total Variation clustering (MTV), clustering by non-negative matrix factorization (NMFR) or the INCRES method
Bouron, Pascal. "Méthodes ensemblistes pour le diagnostic, l'estimation d'état et la fusion de données temporelles". Compiègne, 2002. http://www.theses.fr/2002COMP1395.
Texto completo da fonteSet-membership methods for diagnosis, state estimation and data fusion The works presented in this thesis constitute a contribution to the use of set-membership methods the state estimation and fault detection and isolation. These methods are used in the actual context of localisation and dynamic diagnosis of a vehicle. After describing the system used in the analysis and defining dynamical model, we expose an adaptation of the classical diagnosis method based on analytical redundancy in the context of a bounded-error modelisation of the noises. This method has been validated with real data coming froID our demonstrator. The second original aspect of this work is the utilization of set-membership methods for state estimation. It has led to the elaboration of alternative methods to improve the mn times of Sallie algorithms. Finally, the development of a syntaxical analysis module allowed us to easily process the methods based on constraint propagation. We have validated these methods with actual data for the localization, and with simulated data for the estimation of the drift. Moreover, a comparison of the accuracy of the estimation with the Extended Kalman Filtering has been realized
Zemirline, Abdelhamid. "Définition et fusion de systèmes diagnostic à l'aide d'un processus de fouille de données : Application aux systèmes diagnostics". Télécom Bretagne, 2008. http://www.theses.fr/2008TELB0047.
Texto completo da fonteNowadays, the number of applications requiring Data Mining is growing rapidly in all domains. In medical, we find a number of such applications, however, they are still at experimental or prototype stage. Due to various reasons, a very small number of them enters the daily practice of health professionals. For example, the non-integration of certain notions of type 'graduation' i. E. , a patient is affected by an illness but to such a degree that we cannot consider him completely ill. Then, there is the problem of degree of certainty and the integration of new knowledge and its update which must be taken into consideration for medical applications. In this work, we develop two types of diagnostic systems that depend on fuzzy logic theory to model the uncertainty and to make an analysis in a way similar to human reasoning. The first system that is based on "case-based reasoning" generates a knowledge base from the case base which is composed of membership degrees of a given case to the possible pathologies in such a way that we can easily estimate the similarity that exists between the cases. The second system that we have developed is based on "rule-based reasoning". Another point developed in our work is the fusion of knowledge from homogeneous knowledge sources coming from distinct diagnostic systems. This fusion regroups different experiences in a single system by taking into account the characteristics of different sources without having to reconstruct the knowledge base. We have applied this fusion on both aforementioned diagnostic systems by evaluating them on a medical database. The last part of our work deals with the integration of the systems described earlier in a medical environment taking into account all the constraints associated with the environment
Blanchard, Frédéric. "Visualisation et classification de données multidimensionnelles : Application aux images multicomposantes". Reims, 2005. http://theses.univ-reims.fr/exl-doc/GED00000287.pdf.
Texto completo da fonteThe analysis of multicomponent images is a crucial problem. Visualization and clustering problem are two relevant questions about it. We decided to work in the more general frame of data analysis to answer to these questions. The preliminary step of this work is describing the problems induced by the dimensionality and studying the current dimensionality reduction methods. The visualization problem is then considered and a contribution is exposed. We propose a new method of visualization through color image that provides an immediate and sythetic image od data. Applications are presented. The second contribution lies upstream with the clustering procedure strictly speaking. We etablish a new kind of data representation by using rank transformation, fuzziness and agregation procedures. Its use inprove the clustering procedures by dealing with clusters with dissimilar density or variant effectives and by making them more robust. This work presents two important contributions to the field of data analysis applied to multicomponent image. The variety of the tools involved (originally from decision theory, uncertainty management, data mining or image processing) make the presented methods usable in many diversified areas as well as multicomponent images analysis
Csikós, Mónika. "Efficient Approximations of High-Dimensional Data". Thesis, Université Gustave Eiffel, 2022. http://www.theses.fr/2022UEFL2004.
Texto completo da fonteIn this thesis, we study approximations of set systems (X,S), where X is a base set and S consists of subsets of X called ranges. Given a finite set system, our goal is to construct a small subset of X set such that each range is `well-approximated'. In particular, for a given parameter epsilon in (0,1), we say that a subset A of X is an epsilon-approximation of (X,S) if for any range R in S, the fractions |A cap R|/|A| and |R|/|X| are epsilon-close.Research on such approximations started in the 1950s, with random sampling being the key tool for showing their existence. Since then, the notion of approximations has become a fundamental structure across several communities---learning theory, statistics, combinatorics and algorithms. A breakthrough in the study of approximations dates back to 1971 when Vapnik and Chervonenkis studied set systems with finite VC-dimension, which turned out a key parameter to characterise their complexity. For instance, if a set system (X,S) has VC dimension d, then a uniform sample of O(d/epsilon^2) points is an epsilon-approximation of (X,S) with high probability. Importantly, the size of the approximation only depends on epsilon and d, and it is independent of the input sizes |X| and |S|!In the first part of this thesis, we give a modular, self-contained, intuitive proof of the above uniform sampling guarantee .In the second part, we give an improvement of a 30 year old algorithmic bottleneck---constructing matchings with low crossing number. This can be applied to build approximations with improved guarantees.Finally, we answer a 30 year old open problem of Blumer etal. by proving tight lower bounds on the VC dimension of unions of half-spaces - a geometric set system that appears in several applications, e.g. coreset constructions
Paris, Silvia. "Méthodes de détection parcimonieuses pour signaux faibles dans du bruit : application à des données hyperspectrales de type astrophysique". Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00933827.
Texto completo da fonteMokhtari, Amine. "Système personnalisé de planification d'itinéraire unimodal : une approche basée sur la théorie des ensembles flous". Rennes 1, 2011. http://www.theses.fr/2011REN1E004.
Texto completo da fonteSéchet, Etienne. "Modélisation d'une connaissance imprécise sur les influences des conditions météorologiques dans la propagation du son, à partir de données expérimentales". Paris 9, 1996. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1996PA090026.
Texto completo da fonteDaniel-Vatonne, Marie-Christine. "Les termes : un modèle de représentation et structuration de données symboliques". Montpellier 2, 1993. http://www.theses.fr/1993MON20031.
Texto completo da fonteVeron, Didier. "Utilisation des FADC pour la reconstruction et l'analyse des données de bruit de fond dans l'expérience neutrino de Chooz". Lyon 1, 1997. http://www.theses.fr/1997LYO10074.
Texto completo da fonteDesquesnes, Xavier. "Propagation de fronts et p-laplacien normalisé sur graphes : algorithmes et applications au traitement d’images et de données". Caen, 2012. http://www.theses.fr/2012CAEN2073.
Texto completo da fonteThis work deals with the transcription of continuous partial derivative equations to arbitrary discrete domains by exploiting the formalism of partial difference equations defined on weighted graphs. In the first part, we propose a transcription of the normalized p-Laplacian operator to the graph domains as a linear combination between the non-local infinity Laplacian and the normalized Laplacian (both in their discrete version). This adaptation can be considered as a new class of p-Laplacian operators on graphs that interpolate between non-local infinity Laplacian and normalized Laplacian. In the second part, we present an adaptation of fronts propagation equations on weighted graphs. These equations are obtained by the transcription of the continuous level sets method to a discrete formulation on the graphs domain. Beyond the transcription in itself, we propose a very general formulation and efficient algorithms for the simultaneous propagation of several fronts on a single graph. Both transcription of the p-Laplacian operator and level sets method enable many applications in image segmentation and data clustering that are illustrated in this manuscript. Finally, in the third part, we present a concrete application of the different tools proposed in the two previous parts for computer aided diagnosis. We also present the Antarctic software that was developed during this PhD
Bergès, Corinne. "Étude de systèmes d'acquisitions de données dans deux milieux contraignants : expérimentation spatiale et prospection sismique". Toulouse, INPT, 1999. http://www.theses.fr/1999INPT026H.
Texto completo da fonteJallet, Roxane. "Splines de régression et splines de lissage en régression non paramétrique avec bruit processus". Paris 6, 2008. http://www.theses.fr/2008PA066054.
Texto completo da fonteIn the present work, we are interested in estimation methods of a regular function with a processus noise by smoothing splines and regression splines. Convergence rates results for smoothing splines are presented in the case of processus noise and an extension for unbalanced data is proposed. In order to build the regression splines estimators, we introduce two criteria : ordinary least squares and generalized least squares. For these two regression splines estimators convergence rates are studied and compared. Finally, through simulations the various estimators are compared
Nautet, Vincent. "Etude des méthodes de calcul du rayonnement acoustique des structures à partir des données vibratoires : Application aux antennes des sous-marins". Compiègne, 1998. http://www.theses.fr/1998COMP1104.
Texto completo da fonteDantan, Aurélien. "Génération, stockage et manipulation d'états non classiques pour des ensembles atomiques et des champs électromagnétiques". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2005. http://tel.archives-ouvertes.fr/tel-00011004.
Texto completo da fontemanipulation d'états non classiques de la lumière et des atomes
grâce à l'interaction entre un nuage d'atomes froids et de champs
optiques en cavité.
Après avoir généré expérimentalement des états comprimés et
intriqués du champ lorsque les atomes se comportent comme un milieu
Kerr, nous étudions théoriquement la possibilité de générer de tels
états dans des systèmes à trois niveaux, ainsi que la réduction des
fluctuations quantiques atomiques sous le bruit quantique standard.
Nous présentons ensuite plusieurs schémas pour transférer et stocker
les fluctuations d'états non classiques du champ au spin collectif
d'un ensemble atomique afin de réaliser une mémoire quantique à
atomes froids.
Comme applications pour l'information quantique nous étudions
l'intrication et la téléportation d'ensemble atomique, la
réalisation de mémoires quantiques de longue durée de vie avec des
spins nucléaires d'3He et l'intrication d'oscillateurs
mécaniques.
Bahri, Emna. "Amélioration des procédures adaptatives pour l'apprentissage supervisé des données réelles". Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20089/document.
Texto completo da fonteMachine learning often overlooks various difficulties when confronted real data. Indeed, these data are generally complex, voluminous, and heterogeneous, due to the variety of sources. Among these problems, the most well known concern the sensitivity of the algorithms to noise and unbalanced data. Overcoming these problems is a real challenge to improve the effectiveness of the learning process against real data. In this thesis, we have chosen to improve adaptive procedures (boosting) that are less effective in the presence of noise or with unbalanced data.First, we are interested in robustifying Boosting against noise. Most boosting procedures have contributed greatly to improve the predictive power of classifiers in data mining, but they are prone to noisy data. In this case, two problems arise, (1) the over-fitting due to the noisy examples and (2) the decrease of convergence rate of boosting. Against these two problems, we propose AdaBoost-Hybrid, an adaptation of the Adaboost algorithm that takes into account mistakes made in all the previous iteration. Experimental results are very promising.Then, we are interested in another difficult problem, the prediction when the class is unbalanced. Thus, we propose an adaptive method based on boosted associative classification. The interest of using associations rules is allowing the focus on small groups of cases, which is well suited for unbalanced data. This method relies on 3 contributions: (1) FCP-Growth-P, a supervised algorithm for extracting class frequent itemsets, derived from FP-Growth by introducing the condition of pruning based on counter-examples to specify rules, (2) W-CARP associative classification method which aims to give results at least equivalent to those of existing approaches but in a faster manner, (3) CARBoost, a classification method that uses adaptive associative W-CARP as weak classifier. Finally, in a chapter devoted to the specific application of intrusion’s detection, we compared the results of AdaBoost-Hybrid and CARBoost to those of reference methods (data KDD Cup 99)
Voglozin, W. Amenel. "Le résumé linguistique de données structurées comme support pour l'interrogation". Phd thesis, Université de Nantes, 2007. http://tel.archives-ouvertes.fr/tel-00481049.
Texto completo da fonteMagnan, Christophe Nicolas. "Apprentissage à partir de données diversement étiquetées pour l'étude du rôle de l'environnement local dans les interactions entre acides aminés". Aix-Marseille 1, 2007. http://www.theses.fr/2007AIX11022.
Texto completo da fonteThe 3D structure of proteins is constrained by some interactions between distant amino acids in the primary sequences. An accurate prediction of these bonds may be a step forward for the prediction of the 3D structure from sequences. A review of the literature raises questions about the role of the neighbourhood of bonded amino acids in the formation of these bonds. We show that we have to investigate uncommon learning frameworks to answer these questions. The first one is a particular case of semi-supervised learning, in which the only labelled data to learn from belong to one class, and the second one considers that the data are subject to class-conditional classification noise. We show that learning in these frameworks leads to ill-posed problems. We give some assumptions that make these problems well-posed. We propose adaptations of well-known methods to these learning frameworks. We apply them to try to answer the questions on the biological problem considered in this study
Barriot, Roland. "Intégration des connaissances biologiques à l'échelle de la cellule". Bordeaux 1, 2005. http://www.theses.fr/2005BOR13100.
Texto completo da fonteIlponse, Fabrice. "Analyse du bruit dû aux couplages capacitifs dans les circuits intégrés numériques fortement submicroniques". Paris 6, 2002. http://www.theses.fr/2002PA066417.
Texto completo da fonteMoreau, Aurélien. "How fuzzy set theory can help make database systems more cooperative". Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S043/document.
Texto completo da fonteIn this thesis, we are interested in how we can leverage fuzzy logic to improve the interactions between relational database systems and humans. Cooperative answering techniques aim to help users harness the potential of DBMSs. These techniques are expected to be robust and always provide answer to users. Empty set (0,00 sec) is a typical example of answer that one may wish to never obtain. The informative nature of explanations is higher than that of actual answers in several cases, e.g. empty answer sets and plethoric answer sets, hence the interest of robust cooperative answering techniques capable of both explaining and improving an answer set. Using terms from natural language to describe data --- with labels from fuzzy vocabularies --- contributes to the interpretability of explanations. Offering to define and refine vocabulary terms increases the personalization experience and improves the interpretability by using the user's own words. We propose to investigate the use of explanations in a cooperative answering setting using three research axes: 1) in the presence of a plethoric set of answers; 2) in the context of recommendations; 3) in the context of a query/answering problem. These axes define cooperative techniques where the interest of explanations is to enable users to understand how results are computed in an effort of transparency. The informativeness of the explanations brings an added value to the direct results, and that in itself represents a cooperative answer
Pajot, Gwendoline. "Caractérisation, analyse et interprétation des données de gradiométrie en gravimétrie". Phd thesis, Institut de physique du globe de paris - IPGP, 2007. http://tel.archives-ouvertes.fr/tel-00341117.
Texto completo da fonteChen, Mingkun. "Classification de variables autour de variables latentes avec filtrage de l’information : application à des données en grande dimension". Nantes, 2014. http://archive.bu.univ-nantes.fr/pollux/show.action?id=dc97aa41-ffd6-432b-a740-06382adaca0a.
Texto completo da fonteWith the development of high-throughput analysis techniques, researchers have adopted systematic approaches to describe simultaneously a large number of variables. However, one of the important challenges lies in the diffculty to summarise and interpret this enormous quantity of information. We adopt a clustering of variables approach (CLV) which allows us to highlight disjunctive structures, and therefore, reduce the dimensionality of the problem and facilitate the interpretation of the data at hand. However, in order to further improve the relevance of such approaches, two directions of investigation are proposed. The first direction involves filtering the data by setting aside atypical variables or variables associated with noise. For this purpose, a strategy to create an additional group of variables, called noise cluster, and a strategy based on the definition of sparse latent variables are proposed and compared. The second direction concerns the development of a clustering of variables procedure directed to the explanation of a response variable. The implementation of iterative algorithms provides a sequence of group latent variables with good predictive performance. These latent variables are also easy to interpret since each predictive component is associated with a subset of variables assumed to have a one-dimensional structure
Biletska, Krystyna. "Estimation en temps réel des flux origines-destinations dans un carrefour à feux par fusion de données multicapteurs". Compiègne, 2010. http://www.theses.fr/2010COMP1893.
Texto completo da fonteThe quality of the information about origins and destinations (OD) of vehicles in a junction influences the performance of many road transport systems. The period of its update determines the temporal scale of working of these systems. We are interested in the problem of reconstituting of the OD of vehicles crossing a junction, at each traffic light cycle, using the traffic light states and traffic measurements from video sensors. Traffic measurements, provided every second, are the vehicle counts made on each entrance and exit of the junction and the number of vehicles stopped at each inner section of the junction. Thses real date are subject to imperfections. The only existent method, named ORIDI, which is capable of resolving this problem doesn’t take into account the data imperfection. We propose a new method modelling the date imprecision by the theory of fuzzy subsets. It can be applied to any type of junction and is independent of the type of traffic light strategy. The method estimates OD flows from the vehicle conservation law represented by an underdetermined system of equations constructed in a dynamic way at each traffic light cycle using to the fuzzy a-timed Petri nets. A unique solution is found thanks to eight different methods which introduce estimate in the form of point, interval or fuzzy set. Our study shows that the crisp methods are accurate like ORIDI, but more robust when one of the video sensors is broken down. The interval and fuzzy methods, being less accurate than ORIDI, try to guarantee that the solution includes the true value
Adong, Feddy. "Écoulements diphasiques, surfaces rugueuses et vitesse de glissement : modélisation asymptotique et calcul Numérique". Caen, 2014. http://www.theses.fr/2014CAEN2065.
Texto completo da fonteThe thesis considers two-phase flows of immiscible fluids and in particular those past micro-textured rough surfaces where the gas phase is completely trapped within the roughness cavity. The work is divided in two parts the first of which is dedicated to asymptotic modelling while the second to developing a computational solver to simulate flows characterised by strong capillarity. The asymptotic analysis itself is based on interfaces with small deflections and focuses on a rectangular cavity. This leads to a semi-analytic approximation when the viscous stress, applied by the fluid trapped beneath the interface within the roughness cavity, is neglected. It is found that when the cavity is shallow, the viscous stress must be taken into account where a second approximation is needed. In both cases, it is shown that taking into account the interface curvature and/or the flow of the trapped fluid implies a reduction of the effective slip. In the second part, a new computational code is developed in modifying the interFoam solver of the OpenFoam open-source package. In this new solver, the curvature computation is improved by the introduction of the Level-Set function. The latter is then coupled to a numerical filter to reduce further parasite oscillations. Further more, essentially non-oscillating schemes, different models of dynamic contact angles and a non-dimensional formulation are also integrated in the code. These contributions and modifications are then validated on Bench-marks flow problems with the code being finally applied to the problem under consideration. Comparisons between numerical results and asymptotic modelling are also presented
Al-Najdi, Atheer. "Une approche basée sur les motifs fermés pour résoudre le problème de clustering par consensus". Thesis, Université Côte d'Azur (ComUE), 2016. http://www.theses.fr/2016AZUR4111/document.
Texto completo da fonteClustering is the process of partitioning a dataset into groups, so that the instances in the same group are more similar to each other than to instances in any other group. Many clustering algorithms were proposed, but none of them proved to provide good quality partition in all situations. Consensus clustering aims to enhance the clustering process by combining different partitions obtained from different algorithms to yield a better quality consensus solution. In this work, a new consensus clustering method, called MultiCons, is proposed. It uses the frequent closed itemset mining technique in order to discover the similarities between the different base clustering solutions. The identified similarities are presented in a form of clustering patterns, that each defines the agreement between a set of base clusters in grouping a set of instances. By dividing these patterns into groups based on the number of base clusters that define the pattern, MultiCons generates a consensussolution from each group, resulting in having multiple consensus candidates. These different solutions are presented in a tree-like structure, called ConsTree, that facilitates understanding the process of building the multiple consensuses, and also the relationships between the data instances and their structuring in the data space. Five consensus functions are proposed in this work in order to build a consensus solution from the clustering patterns. Approach 1 is to just merge any intersecting clustering patterns. Approach 2 can either merge or split intersecting patterns based on a proposed measure, called intersection ratio
Broudin, Gwenaelle. "Recherche de la double décroissance bêta sans émission de neutrino du 82Se. Analyse des données et modélisation du bruit de fond du détecteur NEMO3". Phd thesis, Université Sciences et Technologies - Bordeaux I, 2007. http://tel.archives-ouvertes.fr/tel-00404363.
Texto completo da fonteBroudin, Gwénaëlle. "Recherche de la double décroissance bêta sans émission de neutrino du ⁸²Se : Analyse des données et modélisation du bruit de fond du détecteur NEMO3". Bordeaux 1, 2007. http://www.theses.fr/2007BOR13376.
Texto completo da fonteJabbour-Hattab, Jean. "Une approche probabiliste du profil des arbres binaires de recherche". Versailles-St Quentin en Yvelines, 2001. http://www.theses.fr/2001VERS002V.
Texto completo da fonte