Rozprawy doktorskie na temat „Classification des données brevets”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Classification des données brevets”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Baldit, Patrick. "La sériation des similarités spécifiques : outil pour la recherche de l'information stratégique : une méthode de classification automatique de l'information issue des bases de données en veille technologique". Aix-Marseille 3, 1994. http://www.theses.fr/1994AIX30086.
Pełny tekst źródłaHuot, Charles. "Analyse relationnelle pour la veille technologique : vers l'analyse automatique des bases de données". Université Paul Cézanne (Aix-Marseille). Faculté des sciences et techniques de Saint-Jérôme, 1992. http://www.theses.fr/1992AIX30089.
Pełny tekst źródłaFaucompré, Pascal. "La mise en correspondance automatique de banques de données bibliographiques scientifiques et techniques à l'aide de la Classification Internationale des Brevets : contribution au rapprochement de la science et de la technologie". Aix-Marseille 3, 1997. http://www.theses.fr/1997AIX30128.
Pełny tekst źródłaScientific information and technical information are at the heart of technological success, which brings economic advantage. Technical innovation needs a more active technological awareness than a neutral observation. It is no more sufficient to watch and to analyse the relationship between science and technique: firms and laboratories have to search it to obtain closer links between them. On one hand, multiple paths between scientific databases attempt to establish the most exact concordances. This is incompatible with such dialog. On the other hand, all patent documents are classified with codes of the International Patent Classification (IPC). This classification can offer the opportunity to use a common language for heterogeneous information. In fact, the IPC does not allow a direct link with other indexing languages because of its hierarchical structure and its complexity. However, its keywords index (catchwords) brings a useful compatibility with these documentary tools. In a first stage, a correspondence system has been built using these catchwords. Then scientific bibliographic references have been indexed with IPC codes. This tool brings to end-users new relations between fundamental literature and patent documents. However, the analysis of results shows that automatic established paths never are bi-univocal because they attempt to get a wilder search area and not to close a set of response. In accordance with a wilder perspective than the documentary viewpoint, this new relation could bring strategic elements to technological awareness
Dos, Santos Raimundo N. Macedo. "Rationalisation de l'usage de la Classification Internationale des Brevets par l'analyse fonctionnelle pour répondre à la demande de l'information industrielle". Aix-Marseille 3, 1995. http://www.theses.fr/1995AIX30037.
Pełny tekst źródłaCherrabi, El Alaoui Nezha. "Un prisme sémantique des brevets par thésaurus interposés : positionnement, essais et applications". Electronic Thesis or Diss., Toulon, 2020. http://www.theses.fr/2020TOUL4003.
Pełny tekst źródłaWe live in an information society, characterized by an explosion of data available on the web and in different databases. Researchers in the field of information stress the need for relevant information. Information literacy has always been the challenge for humanity to maintain its survival, now information must be of a sufficient degree of reliability to avoid polluting knowledge. The patent is a multidimensional source, a leading source of information. The instrumented analysis of patent data is becoming a necessity and constitutes, for companies, industrialists and the State, a resource for the most efficient measurement of inventive activity, for an objective approach. Searching patent databases is a complex task for several reasons, the number of existing patents is very high and increasing rapidly, keyword searches do not yield satisfactory results, large companies use professionals capable of performing targeted and efficient searches, which is often not the case for university researchers, students and other profiles.Hence the need for the machine to help experts and non-experts alike to better exploit patent information. Thus, we propose a method to accompany the user in the use of this documentation. This method is based on a standardized reference system of man-made technical principles, which are themselves described by terminology sets that we combine with natural language processing (NLP) tools to dispense with the editorial forms of patents and to extend the associated vocabularies
Pellier, Karine. "La dynamique structurelle et spatiale des systèmes de brevets". Thesis, Montpellier 1, 2010. http://www.theses.fr/2010MON10025.
Pełny tekst źródłaAt the behest of Schumpeter's seminal works, innovation is now positioned at the heart of economic analysis. However, since these pioneering works, not enough innovation studies have been devoted to the uses of patent over time. Starting from this assertion, the present thesis aims first and foremost at providing - in addition to good quality empirical information and new statistical series - a new interpretation of patents in their structural and spatial dimensions, based on a cliometric approach. Our first contribution is to present the organisation of a new database on the evolution over a long period of time of patents in 40 countries from the XVIIth century up to 1945 and in over 150 countries from 1945 to the present time. We show in a second step that rare but nevertheless significant events conditioned the heartbeat of the economic history of patents. Wars, the promulgation of laws, the opening or closing of offices, but also purely statistical effects standardized over the long term the existence of patent systems through the application and granting of the series under study. Furthermore we determine the periodicity of our patent series using a spectral and co-spectral analysis. Finally we propose a more contemporary insight - in terms of convergence - into structural and more specifically spatial dynamics at work in the European countries patent systems
Thenard, Yannick. "Recherche documentaire et diffusion en matière de brevets d'invention". Paris 2, 1996. http://www.theses.fr/1996PA020042.
Pełny tekst źródłaThe company grants to the inventor, for a given period of time, the exclusivity of his invention and the right to forbid the working of that invention, in accordance with the regulations laid down by the law on patents of invention. To obtain the protection granted by the law, the invention has to conform to established criteria and its inventor is committed to disclose it to the public. Certain criteria of patentability are determined in view of the documentary search. Therefore this search, which makes it possible to determine the prior art, is of major importance as regard the right to forbid which is conterred upon the inventor. Given the fact that patents of invention are considered as the best source of information, what search means are used to enable the preparation of the documentary search? the diffusion of the invention is the compensation required by the law in exchange for the protection that it grants. The low now makes provision for this diffusion which replaces the legal publicity organized heretofore. The diffusion helps the technological watch and makes it possible to constitute documentary stocks which are needed for establishing the documentary search. The two themes that are covered, i. E. The search and diffusion in matters of patents of invention, are examined from the point of view of their evolution, an evolution which is closely linked to the progress made in the art of information and to information-processing. The future prospects of these independent matters are considered and they leave people wondering over the place occupied by the documentary search in the law on patents of invention
Nadif, Mohamed. "Classification automatique et données manquantes". Metz, 1991. http://docnum.univ-lorraine.fr/public/UPV-M/Theses/1991/Nadif.Mohamed.SMZ912.pdf.
Pełny tekst źródłaBouquet, Valérie. "Système de veille stratégique au service de la recherche de l'innovation de l'entreprise : principes, outils, applications". Aix-Marseille 3, 1995. http://www.theses.fr/1995AIX30080.
Pełny tekst źródłaNivol, William. "Systèmes de surveillance systématique pour le management stratégique de l'entreprise : le traitement de l'information brevet : de l'information documentaire à l'informationn stratégique". Aix-Marseille 3, 1993. http://www.theses.fr/1993AIX30030.
Pełny tekst źródłaAudebert, Nicolas. "Classification de données massives de télédétection". Thesis, Lorient, 2018. http://www.theses.fr/2018LORIS502/document.
Pełny tekst źródłaThanks to high resolution imaging systems and multiplication of data sources, earth observation(EO) with satellite or aerial images has entered the age of big data. This allows the development of new applications (EO data mining, large-scale land-use classification, etc.) and the use of tools from information retrieval, statistical learning and computer vision that were not possible before due to the lack of data. This project is about designing an efficient classification scheme that can benefit from very high resolution and large datasets (if possible labelled) for creating thematic maps. Targeted applications include urban land use, geology and vegetation for industrial purposes.The PhD thesis objective will be to develop new statistical tools for classification of aerial andsatellite image. Beyond state-of-art approaches that combine a local spatial characterization of the image content and supervised learning, machine learning approaches which take benefit from large labeled datasets for training classifiers such that Deep Neural Networks will be particularly investigated. The main issues are (a) structured prediction (how to incorporate knowledge about the underlying spatial and contextual structure), (b) data fusion from various sensors (how to merge heterogeneous data such as SAR, hyperspectral and Lidar into the learning process ?), (c) physical plausibility of the analysis (how to include prior physical knowledge in the classifier ?) and (d) scalability (how to make the proposed solutions tractable in presence of Big RemoteSensing Data ?)
BARRA, Vincent. "Modélisation, classification et fusion de données biomédicales". Habilitation à diriger des recherches, Université Blaise Pascal - Clermont-Ferrand II, 2004. http://tel.archives-ouvertes.fr/tel-00005998.
Pełny tekst źródłaSilva, Gonçalves da Costa Lorga da Ana Isabel. "Données manquantes et méthodes d'imputation en classification". Paris, CNAM, 2005. http://www.theses.fr/2005CNAM0719.
Pełny tekst źródłaLe but de ce travail est d'étudier l'effet des données manquantes en classification de variables, principalement en classification hiérarchique ascendante, et aussi en classification hiérachique ascendante, et aussi en classification non hiérarchique (ou partitionnement). L'étude est effectuée en considérant les facteurs suivants : pourcentage de donnes manquantes, méthodes d'imputation, coefficients de ressemblance et critères de classification. On suppose que les données manquantes au hasard, mais pas complètement au hasard. Les données manquantes satisfont un schéma majoritairement monotone. Nous avons utilisé comme techniques sans imputation les méthodes listwise et pairwise et comme méthodes d'imputation simple. L'algorithme EM, le modèle de régression OLS, l'algorithme NIPALS et une méthode de régression PLS. Comme méthodes d'imputation multiple basé sur les méthodes de régression PLS. Pour combiner les strctures de classification résultant des méthodes d'imputation multiple nous avons proposé une combinaison par la moyenne des matrices de similarité et deux méthodes de consensus. Nous avons utilisé comme méthodes de classification hiérachique, le saut minimal, le saut maximal, la moyenne parmi les groupes et aussi les AVL et AVB ; pour les matrices de ressemblance, le coefficient d'affinité basique (pour les données continues) -qui correspond à l'indice d'Ochiai; pour les données binaires, le coefficient de corrélation de Bravais-Pearson et l'approximation probabiliste du coefficient d'affinité centré et réduit par la méthode-W. L'étude est basée principalemnt sur des données simulées et complétée par des applications à des données réelles
Borges, Gouvea Barroso Wanise. "Elaboration et mise à disposition d'une base de données de documents de brevet tombés dans le domaine public". Toulon, 2003. http://www.theses.fr/2003TOUL0005.
Pełny tekst źródłaIn this work, we present the advantages of elaborating a tool that certainly will contribute to the technological and economical development of the brazilian enterprises, mainly the SMEs. This tool consists of a database that contains patent documents in public domain, i. E. , inventions that can be legally and freely explored, reproduced or improved by interested enterprises without payment of royalties because they are of public domain technology in the Brazilian territory, enabling technological and economical profits for the Brazilian enterprises and Brazil. We have traced the profile of the patent documents filed in Brazil through the database from the brazilian Trademark and Patent Office - INPI, and verified that the Office has documents from 1971 to the 2002, and about 250,000 patents documents, where, approximately, 140,000 (56%) are in public domain. This database contains inventions of all technological areas, and the biggest documental incidences occur in "human necessities" and "chemistry" areas
Rabah, Mazouzi. "Approches collaboratives pour la classification des données complexes". Thesis, Paris 8, 2016. http://www.theses.fr/2016PA080079.
Pełny tekst źródłaThis thesis focuses on the collaborative classification in the context of complex data, in particular the context of Big Data, we used some computational paradigms to propose new approaches based on HPC technologies. In this context, we aim at offering massive classifiers in the sense that the number of elementary classifiers that make up the multiple classifiers system can be very high. In this case, conventional methods of interaction between classifiers is no longer valid and we had to propose new forms of interaction, where it is not constrain to take all classifiers predictions to build an overall prediction. According to this, we found ourselves faced with two problems: the first is the potential of our approaches to scale up. The second, is the diversity that must be created and maintained within the system, to ensure its performance. Therefore, we studied the distribution of classifiers in a cloud-computing environment, this multiple classifiers system can be massive and their properties are those of a complex system. In terms of diversity of data, we proposed a training data enrichment approach for the generation of synthetic data from analytical models that describe a part of the phenomenon studied. so, the mixture of data reinforces learning classifiers. The experimentation made have shown the great potential for the substantial improvement of classification results
Girard, Régis. "Classification conceptuelle sur des données arborescentes et imprécises". La Réunion, 1997. http://elgebar.univ-reunion.fr/login?url=http://thesesenligne.univ.run/97_08_Girard.pdf.
Pełny tekst źródłaGomes, da Silva Alzennyr. "Analyse des données évolutives : Application aux données d'usage du Web". Paris 9, 2009. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2009PA090047.
Pełny tekst źródłaNowadays, more and more organizations are becoming reliant on the Internet. The Web has become one of the most widespread platforms for information change and retrieval. The growing number of traces left behind user transactions (e. G. : customer purchases, user sessions, etc. ) automatically increases the importance of usage data analysis. Indeed, the way in which a web site is visited can change over time. These changes can be related to some temporal factors (day of the week, seasonality, periods of special offer, etc. ). By consequence, the usage models must be continuously updated in order to reflect the current behaviour of the visitors. Such a task remains difficult when the temporal dimension is ignored or simply introduced into the data description as a numeric attribute. It is precisely on this challenge that the present thesis is focused. In order to deal with the problem of acquisition of real usage data, we propose a methodology for the automatic generation of artificial usage data over which one can control the occurrence of changes and thus, analyse the efficiency of a change detection system. Guided by tracks born of some exploratory analyzes, we propose a tilted window approach for detecting and following-up changes on evolving usage data. In order measure the level of changes, this approach applies two external evaluation indices based on the clustering extension. The proposed approach also characterizes the changes undergone by the usage groups (e. G. Appearance, disappearance, fusion and split) at each timestamp. Moreover, the refereed approach is totally independent of the clustering method used and is able to manage different kinds of data other than usage data. The effectiveness of this approach is evaluated on artificial data sets of different degrees of complexity and also on real data sets from different domains (academic, tourism, e-business and marketing)
Hajjar, Chantal. "Cartes auto-organisatrices pour la classification de données symboliques mixtes, de données de type intervalle et de données discrétisées". Thesis, Supélec, 2014. http://www.theses.fr/2014SUPL0066/document.
Pełny tekst źródłaThis thesis concerns the clustering of symbolic data with bio-inspired geometric methods, more specifically with Self-Organizing Maps. We set up several learning algorithms for the self-organizing maps in order to cluster mixed-feature symbolic data as well as interval-valued data and binned data. Several simulated and real symbolic data sets, including two sets built as part of this thesis, are used to test the proposed methods. In addition, we propose a self-organizing map for binned data in order to accelerate the learning of standard maps, and we use the proposed method for image segmentation
Jeannin, Akodjénou Marc-Ismaël. "Clustering et volume des données". Paris 6, 2008. http://www.theses.fr/2009PA066270.
Pełny tekst źródłaChavent, Marie. "Analyse de données symboliques : une méthode divisive de classification". Paris 9, 1997. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1997PA090029.
Pełny tekst źródłaMarchetti, Franck. "Contribution à la classification de données binaires et qualitatives". Metz, 1989. http://docnum.univ-lorraine.fr/public/UPV-M/Theses/1989/Marchetti.Franck.SMZ897.pdf.
Pełny tekst źródłaWe propose several clustering methods which are specific of binary and categorical data. Each time, we try to keep to the initial data structure. These methods supply partition optimising criteria defined with absolute value distance or L1 distance. The advantage of this approach is to give results easy to interpret in regard of initial data. Then, we define an inertia on binary space. This binary inertia behaves as an ordinary inertia : a relation of the Huyghens type and a relation of decomposition of the inertia are demonstrated. The clustering method and the crossed clustering method for binary data could be replaced in a more usual context. They respectively optimise an inertia criteria and a measure of information. An agglomerative hierarchical method for binary data is also proposed. Then, we studied a principal components analysis for binary data. This analysis, which is defined with binary factors, can be used to find homogeneous submatrix. Every methods proposed here have been programmed and integrated in SICLA system
Llobell, Fabien. "Classification de tableaux de données, applications en analyse sensorielle". Thesis, Nantes, Ecole nationale vétérinaire, 2020. http://www.theses.fr/2020ONIR143F.
Pełny tekst źródłaMultiblock datasets are more and more frequent in several areas of application. This is particularly the case in sensory evaluation where several tests lead to multiblock datasets, each dataset being related to a subject (judge, consumer, ...). The statistical analysis of this type of data has raised an increasing interest over the last thirty years. However, the clustering of multiblock datasets has received little attention, even though there is an important need for this type of data.In this context, a method called CLUSTATIS devoted to the cluster analysis of datasets is proposed. At the heart of this approach is the STATIS method, which is a multiblock datasets analysis strategy. Several extensions of the CLUSTATIS clustering method are presented. In particular, the case of data from the so-called "Check-All-That-Apply" (CATA) task is considered. An ad-hoc clustering method called CLUSCATA is discussed.In order to improve the homogeneity of clusters from both CLUSTATIS and CLUSCATA, an option to add an additional cluster, called "K+1", is introduced. The purpose of this additional cluster is to collect datasets identified as atypical.The choice of the number of clusters is discussed, ans solutions are proposed. Applications in sensory analysis as well as simulation studies highlight the relevance of the clustering approach.Implementations in the XLSTAT software and in the R environment are presented
Rodriguez-Rojas, Oldemar. "Classification et modèles linéaires en analyse des données symboliques". Paris 9, 2000. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2000PA090064.
Pełny tekst źródłaEl, Assaad Hani. "Modélisation et classification dynamique de données temporelles non stationnaires". Thesis, Paris Est, 2014. http://www.theses.fr/2014PEST1162/document.
Pełny tekst źródłaNowadays, diagnosis and monitoring for predictive maintenance of railway components are important key subjects for both operators and manufacturers. They seek to anticipate upcoming maintenance actions, reduce maintenance costs and increase the availability of rail network. In order to maintain the components at a satisfactory level of operation, the implementation of reliable diagnostic strategy is required. In this thesis, we are interested in a main component of railway infrastructure, the railway switch; an important safety device whose failure could heavily impact the availability of the transportation system. The diagnosis of this system is therefore essential and can be done by exploiting sequential measurements acquired successively while the state of the system is evolving over time. These measurements consist of power consumption curves that are acquired during several switch operations. The shape of these curves is indicative of the operating state of the system. The aim is to track the temporal dynamic evolution of railway component state under different operating contexts by analyzing the specific data in order to detect and diagnose problems that may lead to functioning failure. This thesis tackles the problem of temporal data clustering within a broader context of developing innovative tools and decision-aid methods. We propose a new dynamic probabilistic approach within a temporal data clustering framework. This approach is based on both Gaussian mixture models and state-space models. The main challenge facing this work is the estimation of model parameters associated with this approach because of its complex structure. In order to meet this challenge, a variational approach has been developed. The results obtained on both synthetic and real data highlight the advantage of the proposed algorithms compared to other state of the art methods in terms of clustering and estimation accuracy
Tisserant, Guillaume. "Généralisation de données textuelles adaptée à la classification automatique". Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS231/document.
Pełny tekst źródłaWe have work for a long time on the classification of text. Early on, many documents of different types were grouped in order to centralize knowledge. Classification and indexing systems were then created. They make it easy to find documents based on readers' needs. With the increasing number of documents and the appearance of computers and the internet, the implementation of text classification systems becomes a critical issue. However, textual data, complex and rich nature, are difficult to treat automatically. In this context, this thesis proposes an original methodology to organize and facilitate the access to textual information. Our automatic classification approache and our semantic information extraction enable us to find quickly a relevant information.Specifically, this manuscript presents new forms of text representation facilitating their processing for automatic classification. A partial generalization of textual data (GenDesc approach) based on statistical and morphosyntactic criteria is proposed. Moreover, this thesis focuses on the phrases construction and on the use of semantic information to improve the representation of documents. We will demonstrate through numerous experiments the relevance and genericity of our proposals improved they improve classification results.Finally, as social networks are in strong development, a method of automatic generation of semantic Hashtags is proposed. Our approach is based on statistical measures, semantic resources and the use of syntactic information. The generated Hashtags can then be exploited for information retrieval tasks from large volumes of data
Gomes, Da Silva Alzennyr. "Analyse des données évolutives : application aux données d'usage du Web". Phd thesis, Université Paris Dauphine - Paris IX, 2009. http://tel.archives-ouvertes.fr/tel-00445501.
Pełny tekst źródłaBlanchard, Frédéric. "Visualisation et classification de données multidimensionnelles : Application aux images multicomposantes". Reims, 2005. http://theses.univ-reims.fr/exl-doc/GED00000287.pdf.
Pełny tekst źródłaThe analysis of multicomponent images is a crucial problem. Visualization and clustering problem are two relevant questions about it. We decided to work in the more general frame of data analysis to answer to these questions. The preliminary step of this work is describing the problems induced by the dimensionality and studying the current dimensionality reduction methods. The visualization problem is then considered and a contribution is exposed. We propose a new method of visualization through color image that provides an immediate and sythetic image od data. Applications are presented. The second contribution lies upstream with the clustering procedure strictly speaking. We etablish a new kind of data representation by using rank transformation, fuzziness and agregation procedures. Its use inprove the clustering procedures by dealing with clusters with dissimilar density or variant effectives and by making them more robust. This work presents two important contributions to the field of data analysis applied to multicomponent image. The variety of the tools involved (originally from decision theory, uncertainty management, data mining or image processing) make the presented methods usable in many diversified areas as well as multicomponent images analysis
Lomet, Aurore. "Sélection de modèle pour la classification croisée de données continues". Compiègne, 2012. http://www.theses.fr/2012COMP2041.
Pełny tekst źródłaJollois, François-Xavier. "Contribution de la classification automatique à la fouille de données". Metz, 2003. http://docnum.univ-lorraine.fr/public/UPV-M/Theses/2003/Jollois.Francois_Xavier.SMZ0311.pdf.
Pełny tekst źródłaLe, Thanh Van. "Classification prétopologique des données : application à l'analyse des trajectoires patients". Lyon 1, 2007. http://www.theses.fr/2007LYO10296.
Pełny tekst źródłaThe objective of my work is to develop clustering methods based on a new mathemetical concept : pretopology. My approach was driven by two major concerns : (1) to develop methods which are applicable to complex data and for which we cannot use metric concepts due to the nature of these data, (2) to integrate various points of view in the clustering process. Thus, proposed methods are only founded on families of reflexive binary relationships between objects to be clustered. Three methods are developed according to two steps. The first one consists in using the minimal closed subset algorithm to get a covering from data. The second one uses this covering to extract "centers" from which the final clustering is determined. The numbers of clusters is then predetermined by the number of centers, corresponding to information given by the minimal closed subsets. This means the number of clusters is only derived from the data itself. A software tool has been developed in view to test our approach on complex data issued from the PMSI (French DRG system) allowing to give answers to questions related to the concept of "patient profile" inside health care delivery organizations (public hospital and private clinics
Gallopin, Mélina. "Classification et inférence de réseaux pour les données RNA-seq". Thesis, Université Paris-Saclay (ComUE), 2015. http://www.theses.fr/2015SACLS174/document.
Pełny tekst źródłaThis thesis gathers methodologicals contributions to the statistical analysis of next-generation high-throughput transcriptome sequencing data (RNA-seq). RNA-seq data are discrete and the number of samples sequenced is usually small due to the cost of the technology. These two points are the main statistical challenges for modelling RNA-seq data.The first part of the thesis is dedicated to the co-expression analysis of RNA-seq data using model-based clustering. A natural model for discrete RNA-seq data is a Poisson mixture model. However, a Gaussian mixture model in conjunction with a simple transformation applied to the data is a reasonable alternative. We propose to compare the two alternatives using a data-driven criterion to select the model that best fits each dataset. In addition, we present a model selection criterion to take into account external gene annotations. This model selection criterion is not specific to RNA-seq data. It is useful in any co-expression analysis using model-based clustering designed to enrich functional annotation databases.The second part of the thesis is dedicated to network inference using graphical models. The aim of network inference is to detect relationships among genes based on their expression. We propose a network inference model based on a Poisson distribution taking into account the discrete nature and high inter sample variability of RNA-seq data. However, network inference methods require a large number of samples. For Gaussian graphical models, we propose a non-asymptotic approach to detect relevant subsets of genes based on a block-diagonale decomposition of the covariance matrix. This method is not specific to RNA-seq data and reduces the dimension of any network inference problem based on the Gaussian graphical model
Blanchet, Juliette. "Modèles Markoviens et extensions pour la classification de données complexes". Phd thesis, Grenoble 1, 2007. http://www.theses.fr/2007GRE10148.
Pełny tekst źródłaWe address the issue of clustering individuals from « complex » observations in the sense that they do not verify sorne of the classically adopted simplifying assumptions. Ln this work, the individuals to be clustered are assumed to be dependant upon one another. Three clustering problems are considered. The first of these relates to high-dimensional data clustering. For such a problem, we adopt a non-diagonal Gaussian Markovian model which is based upon the fact that most high-dimensional data actually lives in class dependent subspaces of lower dimension. Such a model only requires the estimation of a reasonable number of parameters. The second point attempts go beyond the simplifying assumption of unimodal, and in particular Gaussian, independent noise. We consider for this the recent triplet Markov field model and propose a new family of triplet Markov field models adapted to the framework of a supervised classification. We iIIustrate the fIexibiiity and performances of our models, applied through real texture image recognition. Finally, we tackle the problem of clustering with incomplete observations, i. E. For which sorne values are missing. For this we develop a Markovian method which does not require preliminary imputation of the missing data. We present an application of this methodology on a real gene cIustering issue
Blanchet, Juliette. "Modèles markoviens et extensions pour la classification de données complexes". Phd thesis, Université Joseph Fourier (Grenoble), 2007. http://tel.archives-ouvertes.fr/tel-00195271.
Pełny tekst źródłaLe premier concerne la classification de données lorsque celles-ci sont de grande dimension. Pour un tel problème, nous adoptons un modèle markovien gaussien non diagonal tirant partie du fait que la plupart des observations de grande dimension vivent en réalité dans des sous-espaces propres à chacune des classes et dont les dimensions intrinsèques sont faibles. De ce fait, le nombre de paramètres libres du modèle reste raisonnable.
Le deuxième point abordé s'attache à relâcher l'hypothèse simplificatrice de bruit indépendant unimodal, et en particulier gaussien. Nous considérons pour cela le modèle récent de champ de Markov triplet et proposons une nouvelle famille de Markov triplet adaptée au cadre d'une classification supervisée. Nous illustrons la flexibilité et les performances de nos modèles sur une application à la reconnaissance d'images réelles de textures.
Enfin, nous nous intéressons au problème de la classification d'observations dites incomplètes, c'est-à-dire pour lesquelles certaines valeurs sont manquantes. Nous développons pour cela une méthode markovienne ne nécessitant pas le remplacement préalable des observations manquantes. Nous présentons une application de cette méthodologie à un problème réel de classification de gènes.
Vescovo, Laure. "Outils et méthodes pour la classification pyramidale de données biologiques". Evry-Val d'Essonne, 2007. http://www.theses.fr/2007EVRY0006.
Pełny tekst źródłaThe sequencing of complete genomes produces a lot of data and the comparative genomics introduces new problems. We focus on the improvement of the pyramidal classification for its properties allowing to obtain representations close to the data. The calculation algorithm of the pyramids induces an important skew. We propose two approaches of filtering to correct it: an optimal solution, carried out by isotone regression, and a heuristic approach. We present also an algorithm allowing to obtain the pyramid after the filtering step. We apply the pyramids to the progressive multiple alignment of sequences which uses a guide structure to define the order to align the sequences. We studied the influence of this structure. This step is important and an adapted method should be used. We propose also a mixed approach of alignment, based on the strategies of local and global alignment starting from the pyramids
Brossier, Gildas. "Problèmes de représentation de données par des arbres". Rennes 2, 1986. http://www.theses.fr/1986REN20014.
Pełny tekst źródłaFirst, we begin by studying the properties of distance tables associated with tree-representations, and the relation between these distances. Then we define ordered representations, construct a class of ordering algorithms and study their optimal properties under different conditions. The decomposition properties of distance tables allow us to construct fast algorithms for representations with some optimal properties we extend results when data are asymmetry matrices. Last of all we show in the case of rectangular matrices the necessary and sufficient conditions for the simultaneous representations of two sets of data. When conditions are not satisfied we propose some approximation algorithms
Diatta, Jean. "Une extension de la classification hiérarchique : les quasi-hiérarchies". Aix-Marseille 1, 1996. http://www.theses.fr/1996AIX11023.
Pełny tekst źródłaElisabeth, Erol. "Fouille de données spatio-temporelles, résumés de données et apprentissage automatique : application au système de recommandations touristique, données médicales et détection des transactions atypiques dans le domaine financier". Thesis, Antilles, 2021. http://www.theses.fr/2021ANTI0607.
Pełny tekst źródłaData mining is one of the components of Customer Relationship Management (CRM), widely deployed in companies. It is the process of extracting interesting, non-trivial, implicit, unknown and potentially useful knowledge from data. This process relies on algorithms from various scientific disciplines (statistics, artificial intelligence, databases) to build models from data stored in data warehouses.The objective of determining models, established from clusters in the service of improving knowledge of the customer in the generic sense, the prediction of his behavior and the optimization of the proposed offer. Since these models are intended to be used by users who are specialists in the field of data, researchers in health economics and management sciences or professionals in the sector studied, this research work emphasizes the usability of data mining environments.This thesis is concerned with spatio-temporal data mining. It particularly highlights an original approach to data processing with the aim of enriching practical knowledge in the field.This work includes an application component in four chapters which corresponds to four systems developed:- A model for setting up a recommendation system based on the collection of GPS positioning data,- A data summary tool optimized for the speed of responses to requests for the medicalization of information systems program (PMSI),- A machine learning tool for the fight against money laundering in the financial system,- A model for the prediction of activity in VSEs which are weather-dependent (tourism, transport, leisure, commerce, etc.). The problem here is to identify classification algorithms and neural networks for data analysis aimed at adapting the company's strategy to economic changes
Guillemot, Vincent. "Application de méthodes de classification supervisée et intégration de données hétérogènes pour des données transcriptomiques à haut-débit". Phd thesis, Université Paris Sud - Paris XI, 2010. http://tel.archives-ouvertes.fr/tel-00481822.
Pełny tekst źródłaVandromme, Maxence. "Optimisation combinatoire et extraction de connaissances sur données hétérogènes et temporelles : application à l’identification de parcours patients". Thesis, Lille 1, 2017. http://www.theses.fr/2017LIL10044.
Pełny tekst źródłaHospital data exhibit numerous specificities that make the traditional data mining tools hard to apply. In this thesis, we focus on the heterogeneity associated with hospital data and on their temporal aspect. This work is done within the frame of the ANR ClinMine research project and a CIFRE partnership with the Alicante company. In this thesis, we propose two new knowledge discovery methods suited for hospital data, each able to perform a variety of tasks: classification, prediction, discovering patients profiles, etc.In the first part, we introduce MOSC (Multi-Objective Sequence Classification), an algorithm for supervised classification on heterogeneous, numeric and temporal data. In addition to binary and symbolic terms, this method uses numeric terms and sequences of temporal events to form sets of classification rules. MOSC is the first classification algorithm able to handle these types of data simultaneously. In the second part, we introduce HBC (Heterogeneous BiClustering), a biclustering algorithm for heterogeneous data, a problem that has never been studied so far. This algorithm is extended to support temporal data of various types: temporal events and unevenly-sampled time series. HBC is used for a case study on a set of hospital data, whose goal is to identify groups of patients sharing a similar profile. The results make sense from a medical viewpoint; they indicate that relevant, and sometimes new knowledge is extracted from the data. These results also lead to further, more precise case studies. The integration of HBC within a software is also engaged, with the implementation of a parallel version and a visualization tool for biclustering results
Juery, Damien. "Classification bayésienne non supervisée de données fonctionnelles en présence de covariables". Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20160/document.
Pełny tekst źródłaOne of the major objectives of unsupervised clustering is to find similarity groups in a dataset. With the current development of phenotyping, in which continuous-time data are collected, more and more users require new efficient tools capable of clustering curves.The work presented in this thesis is based on Bayesian statistics. Specifically, we are interested in unsupervised Bayesian clustering of functional data. Nonparametric Bayesian priors allow the construction of flexible and robust models.We generalize a clustering model (DPM), founded on the Dirichlet process, to the functional framework. Unlike current methods which make use of the finite dimension, either by representing curves as linear combinations of basis functions, or by regarding curves as data points, calculations are hereby carried out on complete curves, in the infinite dimension. The reproducing kernel Hilbert space (RKHS) theory allows us to derive, in the infinite dimension, probability density functions of curves with respect to a gaussian measure. In the same way, we make explicit a posterior distribution, given complete curves and not only data points. We suggest generalizing the algorithm "Gibbs sampling with auxiliary parameters" by Neal (2000). The numerical implementation requires the calculation of inner products, which are approximated from numerical methods. Some case studies on real and simulated data are also presented, then discussed.Finally, the addition of an extra hierarchy in our model allows us to take functional covariates into account. For that purpose, we will show that it is possible to define several models. The previous algorithmic method is therefore extended to each of these models. Some case studies on simulated data are presented
D'ambrosio, Roberto. "Classification de bases de données déséquilibrées par des règles de décomposition". Phd thesis, Université Nice Sophia Antipolis, 2014. http://tel.archives-ouvertes.fr/tel-00995021.
Pełny tekst źródłaMure, Simon. "Classification non supervisée de données spatio-temporelles multidimensionnelles : Applications à l’imagerie". Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI130/document.
Pełny tekst źródłaDue to the dramatic increase of longitudinal acquisitions in the past decades such as video sequences, global positioning system (GPS) tracking or medical follow-up, many applications for time-series data mining have been developed. Thus, unsupervised time-series data mining has become highly relevant with the aim to automatically detect and identify similar temporal patterns between time-series. In this work, we propose a new spatio-temporal filtering scheme based on the mean-shift procedure, a state of the art approach in the field of image processing, which clusters multivariate spatio-temporal data. We also propose a hierarchical time-series clustering algorithm based on the dynamic time warping measure that identifies similar but asynchronous temporal patterns. Our choices have been motivated by the need to analyse magnetic resonance images acquired on people affected by multiple sclerosis. The genetics and environmental factors triggering and governing the disease evolution, as well as the occurrence and evolution of individual lesions, are still mostly unknown and under intense investigation. Therefore, there is a strong need to develop new methods allowing automatic extraction and quantification of lesion characteristics. This has motivated our work on time-series clustering methods, which are not widely used in image processing yet and allow to process image sequences without prior knowledge on the final results
Samé, Allou Badara. "Modèles de mélange et classification de données acoustiques en temps réel". Compiègne, 2004. http://www.theses.fr/2004COMP1540.
Pełny tekst źródłaThe motivation for this Phd Thesis was a real-time flaw diagnosis application for pressurized containers using acoustic emissions. It has been carried out in collaboration with the Centre Technique des Industries Mécaniques (CETIM). The aim was to improve LOTERE, a real-time computer-aided-decision software, which has been found to be too slow when the number of acoustic emissions becomes large. Two mixture model-based clustering approaches, taking into account time constraints, have been proposed. The first one consists in clustering 'bins' resulting from the conversion of original observations into an histogram. The second one is an on-line approach updating recursively the classification. An experimental study using both simulated and real data has shown that the proposed methods are very efficient
D'Ambrosio, Roberto. "Classification de bases de données déséquilibrées par des règles de décomposition". Thesis, Nice, 2014. http://www.theses.fr/2014NICE4007/document.
Pełny tekst źródłaDisproportion among class priors is encountered in a large number of domains making conventional learning algorithms less effective in predicting samples belonging to the minority classes. We aim at developing a reconstruction rule suited to multiclass skewed data. In performing this task we use the classification reliability that conveys useful information on the goodness of classification acts. In the framework of One-per-Class decomposition scheme we design a novel reconstruction rule, Reconstruction Rule by Selection, which uses classifiers reliabilities, crisp labels and a-priori distributions to compute the final decision. Tests show that system performance improves using this rule rather than using well-established reconstruction rules. We investigate also the rules in the Error Correcting Output Code (ECOC) decomposition framework. Inspired by a statistical reconstruction rule designed for the One-per-Class and Pair-Wise Coupling decomposition approaches, we have developed a rule that applies softmax regression on reliability outputs in order to estimate the final classification. Results show that this choice improves the performances with respect to the existing statistical rule and to well-established reconstruction rules. On the topic of reliability estimation we notice that small attention has been given to efficient posteriors estimation in the boosting framework. On this reason we develop an efficient posteriors estimator by boosting Nearest Neighbors. Using Universal Nearest Neighbours classifier we prove that a sub-class of surrogate losses exists, whose minimization brings simple and statistically efficient estimators for Bayes posteriors
Nair, Benrekia Noureddine Yassine. "Classification interactive multi-label pour l’aide à l’organisation personnalisée des données". Nantes, 2015. https://archive.bu.univ-nantes.fr/pollux/show/show?id=bb2e3d25-7f53-4b66-af04-a9fb5e80ea28.
Pełny tekst źródłaThe growing importance given today to personalized contents led to the development of several interactive classification systems for various novel applications. Nevertheless, all these systems use a single-label item classification which greatly constrains the user's expressiveness. The major problem common to all developers of an interactive multi-label system is: which multi-label classifier should we choose? Experimental evaluations of recent interactive learning systems are mainly subjective. The importance of their conclusions is consequently limited. To draw more general conclusions for guiding the selection of a suitable learning algorithm during the development of such a system, we extensively study the impact of the major interactivity constraints (learning from few examples in a limited time) on the classifier predictive and time-computation performances. The experiments demonstrate the potential of an ensemble learning approach Random Forest of Predictive Clustering Trees(RF-PCT). However,the strong constraint imposed by the interactivity on the computation time has led us to propose a new hybrid learning approach FMDI-RF+ which associates RF-PCT with an efficient matrix factorization approach for dimensionality reduction. The experimental results indicate that RF-FMDI+ is as accurate as RF-PCT in the predictions with a significant advantage to FMDI-RF + for the speed of computation
Botte-Lecocq, Claudine. "L'analyse de données multidimensionnelles par transformations morphologiques binaires". Lille 1, 1991. http://www.theses.fr/1991LIL10142.
Pełny tekst źródłaKettaf, Fatima-Zohra. "Contribution des algorithmes évolutionnaires au partitionnement des données". Tours, 1997. http://www.theses.fr/1997TOUR4008.
Pełny tekst źródłaIn this work we are interested in the contribution of evolutionary algorithms (genetic algorithm and evolution strategy) to the partionning problem. It is well known today, that evolutionary methods with their implicit parallelism, have good abilities to perform a global search in the space of possible solutions of the problem at hand, and that they don't need any prior modelling of the data. We propose here new partition encodings with known or unknown number of clusters, adapted to different clustreing models (exclusive, fuzzy, possibilist, mixture model. . . ). These encodings are : belongness, prototype, and similitude. The algorithms we propose are based on these encodings and seek in the partitions'space the number of clusters and the "optimal" partition in regard to a predefined criterion. One of the originality of this thesis is the use of variable length chromosomes, which easily adapt to partitions' encodings with variable number of clusters. We also introduce new genetic operators : insertion and deletion of clusters as in the isodata algorithm. Finally, we give some experimental results on simulated and real data of our algorithms and compare them to GMVE and isodata ones
Pigeau, Antoine. "Structuration géo-temporelle de données multimédia personnelles". Phd thesis, Nantes, 2005. http://www.theses.fr/2005NANT2131.
Pełny tekst źródłaUsage of mobile devices raises the need for organizing large personal multimedia collection. The present work focus on personal image collections acquired from mobile phones equipped with a camera. We deal with the structuring of an image collection as a clustering problem. Our solution consists in building two distinct temporal and spatial partitions, based on the temporal and spatial metadata of each image. The main ingredients of our approach are the Gaussian mixture models and the ICL criterion to determine the models complexities. First, we propose an incremental optimization algorithm to build non-hierarchical partitions in an automatic manner. It is then combined with an agglomerative algorithm to provide an incremental hierarchical algorithm. Finally, two techniques are roposed to build hybrid spatio-temporal classifications taking into account the human machine interaction constraints
Jaziri, Rakia. "Modèles de mélanges topologiques pour la classification de données structurées en séquences". Paris 13, 2013. http://scbd-sto.univ-paris13.fr/secure/edgalilee_th_2013_jaziri.pdf.
Pełny tekst źródłaRecent years have seen the development of data mining techniques in various application areas, with the purpose of analyzing sequential, large and complex data. In this work, the problem of clustering, visualization and structuring data is tackled by a three-stage proposal. The first proposal present a generative approach to learn a new probabilistic Self-Organizing Map (PrSOMS) for non independent and non identically distributed data sets. Our model defines a low dimensional manifold allowing friendly visualizations. To yield the topology preserving maps, our model exhibits the SOM like learning behavior with the advantages of probabilistic models. This new paradigm uses HMM (Hidden Markov Models) formalism and introduces relationships between the states. This allows us to take advantage of all the known classical views associated to topographic map. The second proposal concerns a hierarchical extension of the approach PrSOMS. This approach deals the complex aspect of the data in the classification process. We find that the resulting model ”H-PrSOMS” provides a good interpretability of classes built. The third proposal concerns an alternative approach statistical topological MGTM-TT, which is based on the same paradigm than HMM. It is a generative topographic modeling observation density mixtures, which is similar to a hierarchical extension of time GTM model. These proposals have then been applied to test data and real data from the INA (National Audiovisual Institute). This work is to provide a first step, a finer classification of audiovisual broadcast segments. In a second step, we sought to define a typology of the chaining of segments (multiple scattering of the same program, one of two inter-program) to provide statistically the characteristics of broadcast segments. The overall framework provides a tool for the classification and structuring of audiovisual programs
Aldea, Emanuel. "Apprentissage de données structurées pour l'interprétation d'images". Paris, Télécom ParisTech, 2009. http://www.theses.fr/2009ENST0053.
Pełny tekst źródłaImage interpretation methods use primarily the visual features of low-level or high-level interest elements. However, spatial information concerning the relative positioning of these elements is equally beneficial, as it has been shown previously in segmentation and structure recognition. Fuzzy representations permit to assess at the same time the imprecision degree of a relation and the gradual transition between the satisfiability and the non-satisfiability of a relation. The objective of this work is to explore techniques of spatial information representation and their integration in the learning process, within the context of image classifiers that make use of graph kernels. We motivate our choice of labeled graphs for representing images, in the context of learning with SVM classifiers. Graph kernels have been studied intensively in computational chemistry and biology, but an adaptation for image related graphs is necessary, since image structures and properties of the information encoded in the labeling are fundamentally different. We illustrate the integration of spatial information within the graphical model by considering fuzzy adjacency measures between interest elements, and we define a family of graph representations determined by different thresholds applied to these spatial measures. Finally, we employ multiple kernel learning in order to build up classifiers that can take into account different graphical representations of the same image at once. Results show that spatial information complements the visual features of distinctive elements in images and that adapting the discriminative kernel functions for the fuzzy spatial representations is beneficial in terms of performance