Дисертації з теми "Gestion de données transcripomiques"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Gestion de données transcripomiques".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Bouvier, Matteo. "Identification et contrôle de réseaux de régulation de gènes." Electronic Thesis or Diss., Lyon, École normale supérieure, 2023. http://www.theses.fr/2023ENSL0117.
Precise inference of Gene Regulatory Networks (GRNs) remains to this day a challenging task in the systems biology field but would allow us to explain the processes of cellular decision-making. Previous work in our team has led to the proposal of an iterative GRN inference algorithm that does not produce a single GRN but rather an ensemble of executable candidate networks. This thesis proposes a strategy for GRN selection from an ensemble that relies on design of experiments. First, we introduce two Python libraries for the storage and manipulation of the very large datasets generated by the simulation of our GRNs. These libraries control the memory footprint of large and dense matrices. Then, we propose a design of experiment strategy for selecting networks. A small number of promising perturbations is selected by topological analysis of the GRNs. Perturbations are simulated and the most discriminative is chosen. Finally, we developed an algorithm for controlling GRNs by determining the sequence of stimuli to apply to reach a desired cell state. A proof of concept is presented
Medina, Marquez Alejandro. "L'analyse des données évolutives." Paris 9, 1985. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1985PA090022.
Le, Béchec Antony. "Gestion, analyse et intégration des données transcriptomiques." Rennes 1, 2007. http://www.theses.fr/2007REN1S051.
Aiming at a better understanding of diseases, transcriptomic approaches allow the analysis of several thousands of genes in a single experiment. To date, international standard initiatives have allowed the utilization of large quantity of data generated using transcriptomic approaches by the whole scientific community, and a large number of algorithms are available to process and analyze the data sets. However, the major challenge remaining to tackle is now to provide biological interpretations to these large sets of data. In particular, their integration with additional biological knowledge would certainly lead to an improved understanding of complex biological mechanisms. In my thesis work, I have developed a novel and evolutive environment for the management and analysis of transcriptomic data. Micro@rray Integrated Application (M@IA) allows for management, processing and analysis of large scale expression data sets. In addition, I elaborated a computational method to combine multiple data sources and represent differentially expressed gene networks as interaction graphs. Finally, I used a meta-analysis of gene expression data extracted from the literature to select and combine similar studies associated with the progression of liver cancer. In conclusion, this work provides a novel tool and original analytical methodologies thus contributing to the emerging field of integrative biology and indispensable for a better understanding of complex pathophysiological processes
Maniu, Silviu. "Gestion des données dans les réseaux sociaux." Thesis, Paris, ENST, 2012. http://www.theses.fr/2012ENST0053/document.
We address in this thesis some of the issues raised by the emergence of social applications on the Web, focusing on two important directions: efficient social search inonline applications and the inference of signed social links from interactions between users in collaborative Web applications. We start by considering social search in tagging (or bookmarking) applications. This problem requires a significant departure from existing, socially agnostic techniques. In a network-aware context, one can (and should) exploit the social links, which can indicate how users relate to the seeker and how much weight their tagging actions should have in the result build-up. We propose an algorithm that has the potential to scale to current applications, and validate it via extensive experiments. As social search applications can be thought of as part of a wider class of context-aware applications, we consider context-aware query optimization based on views, focusing on two important sub-problems. First, handling the possible differences in context between the various views and an input query leads to view results having uncertain scores, i.e., score ranges valid for the new context. As a consequence, current top-k algorithms are no longer directly applicable and need to be adapted to handle such uncertainty in object scores. Second, adapted view selection techniques are needed, which can leverage both the descriptions of queries and statistics over their results. Finally, we present an approach for inferring a signed network (a "web of trust")from user-generated content in Wikipedia. We investigate mechanisms by which relationships between Wikipedia contributors - in the form of signed directed links - can be inferred based their interactions. Our study sheds light into principles underlying a signed network that is captured by social interaction. We investigate whether this network over Wikipedia contributors represents indeed a plausible configuration of link signs, by studying its global and local network properties, and at an application level, by assessing its impact in the classification of Wikipedia articles.javascript:nouvelleZone('abstract');_ajtAbstract('abstract')
Maniu, Silviu. "Gestion des données dans les réseaux sociaux." Electronic Thesis or Diss., Paris, ENST, 2012. http://www.theses.fr/2012ENST0053.
We address in this thesis some of the issues raised by the emergence of social applications on the Web, focusing on two important directions: efficient social search inonline applications and the inference of signed social links from interactions between users in collaborative Web applications. We start by considering social search in tagging (or bookmarking) applications. This problem requires a significant departure from existing, socially agnostic techniques. In a network-aware context, one can (and should) exploit the social links, which can indicate how users relate to the seeker and how much weight their tagging actions should have in the result build-up. We propose an algorithm that has the potential to scale to current applications, and validate it via extensive experiments. As social search applications can be thought of as part of a wider class of context-aware applications, we consider context-aware query optimization based on views, focusing on two important sub-problems. First, handling the possible differences in context between the various views and an input query leads to view results having uncertain scores, i.e., score ranges valid for the new context. As a consequence, current top-k algorithms are no longer directly applicable and need to be adapted to handle such uncertainty in object scores. Second, adapted view selection techniques are needed, which can leverage both the descriptions of queries and statistics over their results. Finally, we present an approach for inferring a signed network (a "web of trust")from user-generated content in Wikipedia. We investigate mechanisms by which relationships between Wikipedia contributors - in the form of signed directed links - can be inferred based their interactions. Our study sheds light into principles underlying a signed network that is captured by social interaction. We investigate whether this network over Wikipedia contributors represents indeed a plausible configuration of link signs, by studying its global and local network properties, and at an application level, by assessing its impact in the classification of Wikipedia articles.javascript:nouvelleZone('abstract');_ajtAbstract('abstract')
Benchkron, Said Soumia. "Bases de données et logiciels intégrés." Paris 9, 1985. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1985PA090025.
Castelltort, Arnaud. "Historisation de données dans les bases de données NoSQLorientées graphes." Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20076.
This thesis deals with data historization in the context of graphs. Graph data have been dealt with for many years but their exploitation in information systems, especially in NoSQL engines, is recent. The emerging Big Data and 3V contexts (Variety, Volume, Velocity) have revealed the limits of classical relational databases. Historization, on its side, has been considered for a long time as only linked with technical and backups issues, and more recently with decisional reasons (Business Intelligence). However, historization is now taking more and more importance in management applications.In this framework, graph databases that are often used have received little attention regarding historization. Our first contribution consists in studying the impact of historized data in management information systems. This analysis relies on the hypothesis that historization is taking more and more importance. Our second contribution aims at proposing an original model for managing historization in NoSQL graph databases.This proposition consists on the one hand in elaborating a unique and generic system for representing the history and on the other hand in proposing query features.We show that the system can support both simple and complex queries.Our contributions have been implemented and tested over synthetic and real databases
Chardonnens, Anne. "La gestion des données d'autorité archivistiques dans le cadre du Web de données." Doctoral thesis, Universite Libre de Bruxelles, 2020. https://dipot.ulb.ac.be/dspace/bitstream/2013/315804/5/Contrat.pdf.
The subject of this thesis is the management of authority records for persons. The research was conducted in an archival context in transition, which was marked by the evolution of international standards of archival description and a shift towards the application of knowledge graphs. The aim of this thesis is to explore how the archival sector can benefit from the developments concerning Linked Data in order to ensure the sustainable management of authority records. Attention is not only devoted to the creation of the records and how they are made available but also to their maintenance and their interlinking with other resources.The first part of this thesis addresses the state of the art of the developments concerning the international standards of archival description as well as those regarding the Wikibase ecosystem. The second part presents an analysis of the possibilities and limits associated with an approach in which the free software Wikibase is used. The analysis is based on an empirical study carried out with data of the Study and Documentation Centre War and Contemporary Society (CegeSoma). It explores the options that are available to institutions that have limited resources and that have not yet implemented Linked Data. Datasets that contain information of people linked to the Second World War were used to examine the different stages involved in the publication of data as Linked Open Data.The experiment carried out in the second part of the thesis shows how a knowledge base driven by software such as Wikibase streamlines the creation of multilingual structured authority data. Examples illustrate how these entities can then be reused and enriched by using external data in interfaces aimed at the general public. This thesis highlights the possibilities of Wikibase, particularly in the context of data maintenance, without ignoring the limitations associated with its use. Due to its empirical nature and the formulated recommendations, this thesis contributes to the efforts and reflections carried out within the framework of the transition of archival metadata.
Doctorat en Information et communication
info:eu-repo/semantics/nonPublished
Tos, Uras. "Réplication de données dans les systèmes de gestion de données à grande échelle." Thesis, Toulouse 3, 2017. http://www.theses.fr/2017TOU30066/document.
In recent years, growing popularity of large-scale applications, e.g. scientific experiments, Internet of things and social networking, led to generation of large volumes of data. The management of this data presents a significant challenge as the data is heterogeneous and distributed on a large scale. In traditional systems including distributed and parallel systems, peer-to-peer systems and grid systems, meeting objectives such as achieving acceptable performance while ensuring good availability of data are major challenges for service providers, especially when the data is distributed around the world. In this context, data replication, as a well-known technique, allows: (i) increased data availability, (ii) reduced data access costs, and (iii) improved fault-tolerance. However, replicating data on all nodes is an unrealistic solution as it generates significant bandwidth consumption in addition to exhausting limited storage space. Defining good replication strategies is a solution to these problems. The data replication strategies that have been proposed for the traditional systems mentioned above are intended to improve performance for the user. They are difficult to adapt to cloud systems. Indeed, cloud providers aim to generate a profit in addition to meeting tenant requirements. Meeting the performance expectations of the tenants without sacrificing the provider's profit, as well as managing resource elasticities with a pay-as-you-go pricing model, are the fundamentals of cloud systems. In this thesis, we propose a data replication strategy that satisfies the requirements of the tenant, such as performance, while guaranteeing the economic profit of the provider. Based on a cost model, we estimate the response time required to execute a distributed database query. Data replication is only considered if, for any query, the estimated response time exceeds a threshold previously set in the contract between the provider and the tenant. Then, the planned replication must also be economically beneficial to the provider. In this context, we propose an economic model that takes into account both the expenditures and the revenues of the provider during the execution of any particular database query. Once the data replication is decided to go through, a heuristic placement approach is used to find the placement for new replicas in order to reduce the access time. In addition, a dynamic adjustment of the number of replicas is adopted to allow elastic management of resources. Proposed strategy is validated in an experimental evaluation carried out in a simulation environment. Compared with another data replication strategy proposed in the cloud systems, the analysis of the obtained results shows that the two compared strategies respond to the performance objective for the tenant. Nevertheless, a replica of data is created, with our strategy, only if this replication is profitable for the provider
Duquet, Mario. "Gestion des données agrométéorologiques pour l'autoroute de l'information." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ61339.pdf.
Rhin, Christophe. "Modélisation et gestion de données géographiques multi-sources." Versailles-St Quentin en Yvelines, 1997. http://www.theses.fr/1997VERS0010.
Zelasco, José Francisco. "Gestion des données : contrôle de qualité des modèles numériques des bases de données géographiques." Thesis, Montpellier 2, 2010. http://www.theses.fr/2010MON20232.
A Digital Surface Model (DSM) is a numerical surface model which is formed by a set of points, arranged as a grid, to study some physical surface, Digital Elevation Models (DEM), or other possible applications, such as a face, or some anatomical organ, etc. The study of the precision of these models, which is of particular interest for DEMs, has been the object of several studies in the last decades. The measurement of the precision of a DSM model, in relation to another model of the same physical surface, consists in estimating the expectancy of the squares of differences between pairs of points, called homologous points, one in each model which corresponds to the same feature of the physical surface. But these pairs are not easily discernable, the grids may not be coincident, and the differences between the homologous points, corresponding to benchmarks in the physical surface, might be subject to special conditions such as more careful measurements than on ordinary points, which imply a different precision. The generally used procedure to avoid these inconveniences has been to use the squares of vertical distances between the models, which only address the vertical component of the error, thus giving a biased estimate when the surface is not horizontal. The Perpendicular Distance Evaluation Method (PDEM) which avoids this bias, provides estimates for vertical and horizontal components of errors, and is thus a useful tool for detection of discrepancies in Digital Surface Models (DSM) like DEMs. The solution includes a special reference to the simplification which arises when the error does not vary in all horizontal directions. The PDEM is also assessed with DEM's obtained by means of the Interferometry SAR Technique
Sandoval, Gomez Maria Del Rosario. "Conception et réalisation du système de gestion de multibases de données MUSE : architecture de schéma multibase et gestion du catalogue des données." Paris 6, 1989. http://www.theses.fr/1989PA066657.
Liroz, Miguel. "Partitionnement dans les systèmes de gestion de données parallèles." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-01023039.
Petit, Loïc. "Gestion de flux de données pour l'observation de systèmes." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00849106.
Liroz-Gistau, Miguel. "Partitionnement dans les Systèmes de Gestion de Données Parallèles." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00920615.
Gürgen, Levent. "Gestion à grande échelle de données de capteurs hétérogènes." Grenoble INPG, 2007. http://www.theses.fr/2007INPG0093.
This dissertation deals with the issues related to scalable management of heterogeneous sensor data. Ln fact, sensors are becoming less and less expensive, more and more numerous and heterogeneous. This naturally raises the scalability problem and the need for integrating data gathered from heterogeneous sensors. We propose a distributed and service-oriented architecture in which data processing tasks are distributed at severallevels in the architecture. Data management functionalities are provided in terms of "services", in order to hide sensor heterogeneity behind generic services. We equally deal with system management issues in sensor farms, a subject not yet explored in this context
Liroz, Gistau Miguel. "Partitionnement dans les systèmes de gestion de données parallèles." Thesis, Montpellier 2, 2013. http://www.theses.fr/2013MON20117/document.
During the last years, the volume of data that is captured and generated has exploded. Advances in computer technologies, which provide cheap storage and increased computing capabilities, have allowed organizations to perform complex analysis on this data and to extract valuable knowledge from it. This trend has been very important not only for industry, but has also had a significant impact on science, where enhanced instruments and more complex simulations call for an efficient management of huge quantities of data.Parallel computing is a fundamental technique in the management of large quantities of data as it leverages on the concurrent utilization of multiple computing resources. To take advantage of parallel computing, we need efficient data partitioning techniques which are in charge of dividing the whole data and assigning the partitions to the processing nodes. Data partitioning is a complex problem, as it has to consider different and often contradicting issues, such as data locality, load balancing and maximizing parallelism.In this thesis, we study the problem of data partitioning, particularly in scientific parallel databases that are continuously growing and in the MapReduce framework.In the case of scientific databases, we consider data partitioning in very large databases in which new data is appended continuously to the database, e.g. astronomical applications. Existing approaches are limited since the complexity of the workload and continuous appends restrict the applicability of traditional approaches. We propose two partitioning algorithms that dynamically partition new data elements by a technique based on data affinity. Our algorithms enable us to obtain very good data partitions in a low execution time compared to traditional approaches.We also study how to improve the performance of MapReduce framework using data partitioning techniques. In particular, we are interested in efficient data partitioning of the input datasets to reduce the amount of data that has to be transferred in the shuffle phase. We design and implement a strategy which, by capturing the relationships between input tuples and intermediate keys, obtains an efficient partitioning that can be used to reduce significantly the MapReduce's communication overhead
Etien-Gnoan, N'Da Brigitte. "L'encadrement juridique de la gestion électronique des données médicales." Thesis, Lille 2, 2014. http://www.theses.fr/2014LIL20022/document.
The electronic management of medical data is as much in the simple automated processing of personal data in the sharing and exchange of health data . Its legal framework is provided both by the common rules to the automated processing of all personal data and those specific to the processing of medical data . This management , even if it is a source of economy, creates protection issues of privacy which the French government tries to cope by creating one of the best legal framework in the world in this field. However , major projects such as the personal health record still waiting to be made and the right to health is seen ahead and lead by technological advances . The development of e-health disrupts relationships within one dialogue between the caregiver and the patient . The extension of the rights of patients , sharing responsibility , increasing the number of players , the shared medical confidentiality pose new challenges with which we must now count. Another crucial question is posed by the lack of harmonization of legislation increasing the risks in cross-border sharing of medical
Gueye, Modou. "Gestion de données de recommandation à très large échelle." Electronic Thesis or Diss., Paris, ENST, 2014. http://www.theses.fr/2014ENST0083.
In this thesis, we address the scalability problem of recommender systems. We propose accu rate and scalable algorithms. We first consider the case of matrix factorization techniques in a dynamic context, where new ratings..are continuously produced. ln such case, it is not possible to have an up to date model, due to the incompressible time needed to compute it. This happens even if a distributed technique is used for matrix factorization. At least, the ratings produced during the model computation will be missing. Our solution reduces the loss of the quality of the recommendations over time, by introducing some stable biases which track users' behavior deviation. These biases are continuously updated with the new ratings, in order to maintain the quality of recommendations at a high leve for a longer time. We also consider the context of online social networks and tag recommendation. We propose an algorithm that takes account of the popularity of the tags and the opinions of the users' neighborhood. But, unlike common nearest neighbors' approaches, our algorithm doe not rely on a fixed number of neighbors when computing a recommendation. We use a heuristic that bounds the network traversai in a way that allows to faster compute the recommendations while preserving the quality of the recommendations. Finally, we propose a novel approach that improves the accuracy of the recommendations for top-k algorithms. Instead of a fixed list size, we adjust the number of items to recommend in a way that optimizes the likelihood that ail the recommended items will be chosen by the user, and find the best candidate sub-list to recommend to the user
Djellalil, Jilani. "Conception et réalisation de multibases de données." Lyon 3, 1989. http://www.theses.fr/1989LYO3A003.
Faye, David Célestin. "Médiation de données sémantique dans SenPeer, un système pair-à-pair de gestion de données." Phd thesis, Université de Nantes, 2007. http://tel.archives-ouvertes.fr/tel-00481311.
Cho, Choong-Ho. "Structuration des données et caractérisation des ordonnancements admissibles des systèmes de production." Lyon, INSA, 1989. http://www.theses.fr/1989ISAL0053.
This work deals, on the one band, with the specification and the modelization of data bases for the scheduling problems in a hierarchical architecture of manufacturing systems, on the other hand, with the analytical specification of the set of feasible solutions for the decision support scheduling problems about three different types of workshops: - first, made up several machines (flowshop: sequences of operations are the same for all jobs), considering the important cri teri on as the set up times under set tasks groups) and potential. Constraints, - second, with only one machine, under the given due dates of jobs constraints, finally, organised in a jobshop, under the three previous constraints: set, potential and due dates. One of original researchs concerns the new structure: PQR trees, to characterise the set of feasible sequences of tasks
Guégot, Françoise. "Gestion d'une base de données mixte, texte et image : application à la gestion médicale dentaire." Paris 9, 1989. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1989PA090042.
In the frame work of organizational data processing, we have shown, on an actual example -a dental surgeon cabinet- that image display constitutes a bonus which may prove decisive in decision making. This should be considered to play down the principles governing a mixed data basic managering system. A basis of text data will be constituted through an S. I. A. D generator which will also perform the necessary processing of the said data. A basis of image data will be established. In parallel with the former, from an inventory of the various image processing techniques. Finally, both basis will be connected to form the mixed data managerial system
Le, Mahec G. "Gestion des bases de données biologiques sur grilles de calculs." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2008. http://tel.archives-ouvertes.fr/tel-00462306.
Pierkot, Christelle. "Gestion de la Mise à Jour de Données Géographiques Répliquées." Phd thesis, Université Paul Sabatier - Toulouse III, 2008. http://tel.archives-ouvertes.fr/tel-00366442.
L'institution militaire utilise elle aussi les données spatiales comme soutien et aide à la décision. A chaque étape d'une mission, des informations géographiques de tous types sont employées (données numériques, cartes papiers, photographies aériennes...) pour aider les unités dans leurs choix stratégiques. Par ailleurs, l'utilisation de réseaux de communication favorise le partage et l'échange des données spatiales entre producteurs et utilisateurs situés à des endroits différents. L'information n'est pas centralisée, les données sont répliquées sur chaque site et les utilisateurs peuvent ponctuellement être déconnectés du réseau, par exemple lorsqu'une unité mobile va faire des mesures sur le terrain.
La problématique principale concerne donc la gestion dans un contexte militaire, d'une application collaborative permettant la mise à jour asynchrone et symétrique de données géographiques répliquées selon un protocole à cohérence faible optimiste. Cela nécessite de définir un modèle de cohérence approprié au contexte militaire, un mécanisme de détection des mises à jour conflictuelles lié au type de données manipulées et des procédures de réconciliation des écritures divergentes adaptées aux besoins des unités participant à la mission.
L'analyse des travaux montre que plusieurs protocoles ont été définis dans les communautés systèmes (Cederqvist :2001 ; Kermarrec :2001) et bases de données (Oracle :2003 ; Seshadri :2000) pour gérer la réplication des données. Cependant, les solutions apportées sont souvent fonctions du besoin spécifique de l'application et ne sont donc pas réutilisables dans un contexte différent, ou supposent l'existence d'un serveur de référence centralisant les données. Les mécanismes employés en information géographique pour gérer les données et les mises à jour ne sont pas non plus appropriés à notre étude car ils supposent que les données soient verrouillées aux autres utilisateurs jusqu'à ce que les mises à jour aient été intégrée (approche check in-check out (ESRI :2004), ou utilisent un serveur centralisé contenant les données de référence (versionnement : Cellary :1990).
Notre objectif est donc de proposer des solutions permettant l'intégration cohérente et autant que possible automatique, des mises à jour de données spatiales dans un environnement de réplication optimiste, multimaître et asynchrone.
Nous proposons une stratégie globale d'intégration des mises à jour spatiales basée sur une vérification de la cohérence couplé à des sessions de mises à jour. L'originalité de cette stratégie réside dans le fait qu'elle s'appuie sur des métadonnées pour fournir des solutions de réconciliation adaptées au contexte particulier d'une mission militaire.
La contribution de cette thèse est double. Premièrement, elle s'inscrit dans le domaine de la gestion de la mise à jour des données spatiales, domaine toujours très actif du fait de la complexité et de l'hétérogénéité des données (Nous limitons néanmoins notre étude aux données géographiques vectorielles) et de la relative «jeunesse » des travaux sur le sujet. Deuxièmement, elle s'inscrit dans le domaine de la gestion de la cohérence des données répliquées selon un protocole optimiste, en spécifiant en particulier, de nouveaux algorithmes pour la détection et la réconciliation de données conflictuelles, dans le domaine applicatif de l'information géographique.
Gagnon, Bertrand. "Gestion d'information sur les procédés thermiques par base de données." Thesis, McGill University, 1986. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=65447.
Antoine, Émilien. "Gestion des données distribuées avec le langage de règles: Webdamlog." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00908155.
Le, Mahec Gaël. "Gestion des bases de données biologiques sur grilles de calcul." Clermont-Ferrand 2, 2008. http://www.theses.fr/2008CLF21891.
Cheballah, Kamal. "Aides à la gestion des données techniques des produits industriels." Ecully, Ecole centrale de Lyon, 1992. http://www.theses.fr/1992ECDL0003.
Cobéna, Grégory. "Gestion des changements pour les données semi-structurés du Web." Palaiseau, Ecole polytechnique, 2003. http://www.theses.fr/2003EPXX0027.
Peerbocus, Mohamed Ally. "Gestion de l'évolution spatiotemporelle dans une base de données géographiques." Paris 9, 2001. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2001PA090055.
Ichiba, Abdellah. "Données radar bande X et gestion prédictive en hydrologie urbaine." Thesis, Paris Est, 2016. http://www.theses.fr/2016PESC1007/document.
The main goal of this thesis was to achieve a reliable management tool of storm water storage basins using high resolution X-band radar. It turned out that it required several research developments. The analysed case study includes a retention basin of 10000 m3 located in Val de Marne county downstream of a 2.15 km2 urban catchment. It has a twofold goal: storm water decontamination and flood protection by volume storage. Operationally the management strategies associated with these two aims are conflicting; hence, a predictive management has been set up: a routine exploitation of the basin in the anti-pollution mode, and a switch to the flood protection mode when needed. It should be based a reliable knowledge of short-term rainfall forecasts. A common way to respond to operational needs of the predictive management is to set up a warning system based on the use of radar data. For example, the CALAMAR system relies on the use of single-polarization raw radar data, coming from Meteo-France radar network, being processed with the conventional Z-R conversion methods followed by a calibration with rain gauge. However, the reliability of such warning systems has been subject to debate, often due to a questionable quality of the resulting radar rainfall estimates, compared to local rain gauges. Therefore a new methodology for more meaningful comparison of radar rainfall field products was developed during this PhD project. Being rooted to the multifractal theory, it allows a comparison of the structure and the morphology of rainfall fields in both space and time through scales. It was initially tested on CALAMAR and Meteo-France rainfall products before being applied for results confirmation on initial data from a X band radar, acquired by Ecole des Ponts ParisTech in the framework of the European project RainGain and providing data at higher resolution (up to 100 m in space and 1 min in time). The obtained results not only highlight the crucial influence of raw data processing on the scaling behaviour, but also permit to pre-define the conditions when the CALAMAR optimization may worsen the quality of rainfall estimates. Such conditions would be very difficult to detect with widely used conventional methods, which rely on a very limited number of radar pixels (only those containing rain gauges). Further extensions of the proposed methodology open new horizons for the rainfall data merging. While the scientific literature, notably around the TOMACS experiment in Japan and CASA one in the United States, highlights the operational benefits of higher resolution rainfall measurements thanks to X-band radars, its impact on the performance of hydrological models still remains a subject of debate. Indeed previous research, mainly based on conceptual models remains inconclusive. To overcome these limitations, we used two models relying on two very distinct modelling approaches: CANOE (semi-distributed and conceptual) and Multi-Hydro (fully distributed and physically based research model developed at ENPC). An operational version of CANOE and a new much finer configuration, which increases the sensitivity of the model to spatio-temporal variability of small-scale rainfall, were used. Several extensions of the Multi-Hydro were developed, including an optimization of its resolution, which greatly improves its whole functionality. It appears from this work that by taking into account the spatial and temporal variability of small-scale rainfall, the performance of hydrologic models can be increased up to 20%.Overall, we believe that this dissertation contributes to the development of new, reliable, operational tools to use in their full extent the high-resolution X-band data
Derakhshannia, Marzieh. "Gestion et optimisation de l’architecture logistique de lacs de données." Thesis, Université de Montpellier (2022-….), 2022. http://www.theses.fr/2022UMONS022.
The digital world with constantly evolution gives rise to the precious concept , "data" that is known as the black gold. In accordance with this evolution, database management systems, which play an important role in data valuation, are becoming an essential element of information systems and decision- making processes. With respect to the digital revolution, data is generated every second in a huge volume, by multiple sources and with different formats.Despite the fact that managing large and dispersed data is a problematic issue , we could not neglect the precious value that could potentially be gained through raw data exploration. This heterogeneity translates into the need for an integrated system to efficiently store, process and analyze the huge amount of scattered data. The phenomenon of huge data, known as big data, requires a decision-making system with an appropriate architecture that stores the heterogeneous data and supports the main characteristics of the big data environment, such as the data volume, the veracity, velocity and veracity. The data lake, which is a centralized storage system, is a good answer to these arising problems to receive raw data on a large scale in their native formats. Concerning this goal, it is clear that the infrastructure and architecture of the data lake have a significant impact on the profitability and functionality of the overall system. In this regard, the design and management of the data lake structure requires practical and innovative methods in order to achieve an integrated and optimal centralized repository. By considering the systematic structure of the data lake as well as the hierarchical architecture of the systems, a logistical vision could lead us to the defined objectives.The supply chain is a good example of logistics systems where hierarchical participants are coordi- nated within an integrated network in order to prepare a product or render services to targeted consumers. The logistics structure as well as the supply chain management strategies could be an innovative source of inspiration to design, manage and optimize a data management system based on a logistics vision.For this reason, the implementation of the analog method between systematic structures clarifies to what extent one could take advantage of management strategies derived from the supply chain to develop the architecture and performance of the data lake.In this thesis, we hypothesize that it is possible to describe a data lake and its functionality by comparing it to the logistical structure of a supply chain. On the basis of these objectives:First, we are interested in relying on several data lake architectures and verifying the effectiveness of these architectures on the performance of the data lake, in particular in relation to data governance and the quality of services.In a second step, we introduce the supply chain, supply chain management and the methods that are used frequently to optimize the supply chain. Furthermore we compare all the elements of this data lake logistics system and focus on their similar points in order to use the data lake supply chain management methods.Thirdly, we propose a new architecture for data lake based on supply chain definition thanks to the evolutionary process of modeling the structures of data lakes. We finish this work by optimizing the proposed data lake architecture with supply chain network design strategies and propose the methods to solve the defined mathematical optimization model
Bourgaux, Camille. "Gestion des incohérences pour l'accès aux données en présence d'ontologies." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS292/document.
The problem of querying description logic knowledge bases using database-style queries (in particular, conjunctive queries) has been a major focus of recent description logic research. An important issue that arises in this context is how to handle the case in which the data is inconsistent with the ontology. Indeed, since in classical logic an inconsistent logical theory implies every formula, inconsistency-tolerant semantics are needed to obtain meaningful answers. This thesis aims to develop methods for dealing with inconsistent description logic knowledge bases using three natural semantics (AR, IAR, and brave) previously proposed in the literature and that rely on the notion of a repair, which is an inclusion-maximal subset of the data consistent with the ontology. In our framework, these three semantics are used conjointly to identify answers with different levels of confidence. In addition to developing efficient algorithms for query answering over inconsistent DL-Lite knowledge bases, we address three problems that should support the adoption of this framework: (i) query result explanation, to help the user to understand why a given answer was (not) obtained under one of the three semantics, (ii) query-driven repairing, to exploit user feedback about errors or omissions in the query results to improve the data quality, and (iii) preferred repair semantics, to take into account the reliability of the data. For each of these three topics, we developed a formal framework, analyzed the complexity of the relevant reasoning problems, and proposed and implemented algorithms, which we empirically studied over an inconsistent DL-Lite benchmark we built. Our results indicate that even if the problems related to dealing with inconsistent DL-Lite knowledge bases are theoretically hard, they can often be solved efficiently in practice by using tractable approximations and features of modern SAT solvers
Dia, Amadou Fall. "Filtrage sémantique et gestion distribuée de flux de données massives." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS495.
Our daily use of the Internet and related technologies generates, at a rapid and variable speeds, large volumes of heterogeneous data issued from sensor networks, search engine logs, multimedia content sites, weather forecasting, geolocation, Internet of Things (IoT) applications, etc. Processing such data in conventional databases (Relational Database Management Systems) may be very expensive in terms of time and memory storage resources. To effectively respond to the needs of rapid decision-making, these streams require real-time processing. Data Stream Management Systems (SGFDs) evaluate queries on the recent data of a stream within structures called windows. The input data are different formats such as CSV, XML, RSS, or JSON. This heterogeneity lock comes from the nature of the data streams and must be resolved. For this, several research groups have benefited from the advantages of semantic web technologies (RDF and SPARQL) by proposing RDF data streams processing systems called RSPs. However, large volumes of RDF data, high input streams, concurrent queries, combination of RDF streams and large volumes of stored RDF data and expensive processing drastically reduce the performance of these systems. A new approach is required to considerably reduce the processing load of RDF data streams. In this thesis, we propose several complementary solutions to reduce the processing load in centralized environment. An on-the-fly RDF graphs streams sampling approach is proposed to reduce data and processing load while preserving semantic links. This approach is deepened by adopting a graph-oriented summary approach to extract the most relevant information from RDF graphs by using centrality measures issued from the Social Networks Analysis. We also adopt a compressed format of RDF data and propose an approach for querying compressed RDF data without decompression phase. To ensure parallel and distributed data streams management, the presented work also proposes two solutions for reducing the processing load in distributed environment. An engine and parallel processing approaches and distributed RDF graphs streams. Finally, an optimized processing approach for static and dynamic data combination operations is also integrated into a new distributed RDF graphs streams management system
Ben, Dhia Imen. "Gestion des grandes masses de données dans les graphes réels." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0087/document.
In the last few years, we have been witnessing a rapid growth of networks in a wide range of applications such as social networking, bio-informatics, semantic web, road maps, etc. Most of these networks can be naturally modeled as large graphs. Managing, analyzing, and querying such data has become a very important issue, and, has inspired extensive interest within the database community. In this thesis, we address the problem of efficiently answering distance queries in very large graphs. We propose EUQLID, an efficient algorithm to answer distance queries on very large directed graphs. This algorithm exploits some interesting properties that real-world graphs exhibit. It is based on an efficient variant of the seminal 2-hop algorithm. We conducted an extensive set of experiments against state-of-the-art algorithms which show that our approach outperforms existing approaches and that distance queries can be processed within hundreds of milliseconds on very large real-world directed graphs. We also propose an access control model for social networks which can make use of EUQLID to scale on very large graphs. This model allows users to specify fine-grained privacy policies based on their relations with other users in the network. We describe and demonstrate Primates as a prototype which enforces the proposed access control model and allows users to specify their privacy preferences via a graphical user-friendly interface
Ben, Dhia Imen. "Gestion des grandes masses de données dans les graphes réels." Electronic Thesis or Diss., Paris, ENST, 2013. http://www.theses.fr/2013ENST0087.
In the last few years, we have been witnessing a rapid growth of networks in a wide range of applications such as social networking, bio-informatics, semantic web, road maps, etc. Most of these networks can be naturally modeled as large graphs. Managing, analyzing, and querying such data has become a very important issue, and, has inspired extensive interest within the database community. In this thesis, we address the problem of efficiently answering distance queries in very large graphs. We propose EUQLID, an efficient algorithm to answer distance queries on very large directed graphs. This algorithm exploits some interesting properties that real-world graphs exhibit. It is based on an efficient variant of the seminal 2-hop algorithm. We conducted an extensive set of experiments against state-of-the-art algorithms which show that our approach outperforms existing approaches and that distance queries can be processed within hundreds of milliseconds on very large real-world directed graphs. We also propose an access control model for social networks which can make use of EUQLID to scale on very large graphs. This model allows users to specify fine-grained privacy policies based on their relations with other users in the network. We describe and demonstrate Primates as a prototype which enforces the proposed access control model and allows users to specify their privacy preferences via a graphical user-friendly interface
Aouiche, Kamel. "Techniques de fouille de données pour l'optimisation automatique des performances des entrepôts de données." Lyon 2, 2005. http://theses.univ-lyon2.fr/documents/lyon2/2005/aouiche_k.
With the development of databases in general and data warehouses in particular, it becomes very important to reduce the function of administration. The aim of auto-administrative systems is administrate and adapt themselves automatically, without loss or even with a gain in performance. The idea of using data mining techniques to extract useful knowledge for administration from the data themselves has been in the air for some years. However, no research has ever been achieved. As for as we know, it nevertheless remains a very promising approach, notably in the field of the data warehousing, where the queries are very heterogeneous and cannot be interpreted easily. The aim of this thesis is to study auto-administration techniques in databases and data warehouses, mainly performance optimization techniques such as indexing and view materialization, and to look for a way of extracting from stored data themselves useful knowledge to apply these techniques. We have designed a tool that finds an index and view configuration allowing to optimize data access time. Our tool searches frequent itemsets in a given workload and clusters the query workload to compute this index and view configuration. Finally, we have extended the performance optimization to XML data warehouses. In this area, we proposed an indexing technique that precomputes joins between XML facts and dimensions and adapted our materialized view selection strategy for XML materialized views
De, Vlieger P. "Création d'un environnement de gestion de base de données " en grille ". Application à l'échange de données médicales." Phd thesis, Université d'Auvergne - Clermont-Ferrand I, 2011. http://tel.archives-ouvertes.fr/tel-00654660.
De, Vlieger Paul. "Création d'un environnement de gestion de base de données "en grille" : application à l'échange de données médicales." Phd thesis, Université d'Auvergne - Clermont-Ferrand I, 2011. http://tel.archives-ouvertes.fr/tel-00719688.
Mizi, Mohammed. "Conception et réalisation d'un système de gestion de bases de formulaires." Lyon, INSA, 1991. http://www.theses.fr/1991ISAL0055.
The aim of the work deal with the development of form base management system (FBMS) using all tools and intervals mechanisms of a DBMS, built upon the concept universal relation, for to conceive and manipulate forms. We focus on the problems of conception, description, manipulation of forms and applications that combine a set of forms. The description of form is realised from the structure of relations of the Database which allows the creation of sample schema from which we obtain the form structure, by composition (using the inheritance rules). This approach is enriched with the extension of relational model (generalisation/specialization) offering flexibility by abject sharing and manipulation of versions. The manipulation is realised with the tools inherited from coupling with relational DBMS and specific modules : calculation, evaluation, recording, searching, restitution etc. . . The management of historical data facilitate the appropriate occurrence of form without alteration during the update of database. The form is on abject of the database. An occurrence of form will be generated from one or many relations. An application will be defined with form that forms it's interface and data that use it. All forms of an application are stocked in the database.
Bellosta, Marie-Jo. "Systèmes d'interfaces pour la gestion d'objets persistants, Omnis." Paris 6, 1992. http://www.theses.fr/1992PA066034.
Diène, Aly Wane. "Contribution à la gestion de structures de données distribuées et scalables." Paris 9, 2001. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2001PA090039.
Le, Trung-Dung. "Gestion de masses de données dans une fédération de nuages informatiques." Thesis, Rennes 1, 2019. http://www.theses.fr/2019REN1S101.
Cloud federations can be seen as major progress in cloud computing, in particular in the medical domain. Indeed, sharing medical data would improve healthcare. Federating resources makes it possible to access any information even on a mobile person with distributed hospital data on several sites. Besides, it enables us to consider larger volumes of data on more patients and thus provide finer statistics. Medical data usually conform to the Digital Imaging and Communications in Medicine (DICOM) standard. DICOM files can be stored on different platforms, such as Amazon, Microsoft, Google Cloud, etc. The management of the files, including sharing and processing, on such platforms, follows the pay-as-you-go model, according to distinct pricing models and relying on various systems (Relational Data Management Systems or DBMSs or NoSQL systems). In addition, DICOM data can be structured following traditional (row or column) or hybrid (row-column) data storages. As a consequence, medical data management in cloud federations raises Multi-Objective Optimization Problems (MOOPs) for (1) query processing and (2) data storage, according to users preferences, related to various measures, such as response time, monetary cost, qualities, etc. These problems are complex to address because of heterogeneous database engines, the variability (due to virtualization, large-scale communications, etc.) and high computational complexity of a cloud federation. To solve these problems, we propose a MedIcal system on clouD federAtionS (MIDAS). First, MIDAS extends IReS, an open source platform for complex analytics workflows executed over multi-engine environments, to solve MOOP in the heterogeneous database engines. Second, we propose an algorithm for estimating of cost values in a cloud environment, called Dynamic REgression AlgorithM (DREAM). This approach adapts the variability of cloud environment by changing the size of data for training and testing process to avoid using the expire information of systems. Third, Non-dominated Sorting Genetic Algorithm based ob Grid partitioning (NSGA-G) is proposed to solve the problem of MOOP is that the candidate space is large. NSGA-G aims to find an approximate optimal solution, while improving the quality of the optimal Pareto set of MOOP. In addition to query processing, we propose to use NSGA-G to find an approximate optimal solution for DICOM data configuration. We provide experimental evaluations to validate DREAM, NSGA-G with various test problem and dataset. DREAM is compared with other machine learning algorithms in providing accurate estimated costs. The quality of NSGA-G is compared to other NSGAs with many problems in MOEA framework. The DICOM dataset is also experimented with NSGA-G to find optimal solutions. Experimental results show the good qualities of our solutions in estimating and optimizing Multi-Objective Problem in a cloud federation
Resseguier, Noémie. "Méthodes de gestion des données manquantes en épidémiologie. : Application en cancérologie." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM5063.
The issue of how to deal with missing data in epidemiological studies is a topic which concerns every researcher involved in the analysis of collected data and in the interpretation of the results produced by these analyses. And even if the issue of the handling of missing data and of their impact on the validity of the results is often discussed, simple, but not always appropriate methods to deal with missing data are commonly used. The use of each of these methods is based on some hypotheses under which the obtained results are valid, but it is not always possible to test these hypotheses. The objective of this work was (i) to propose a review of various methods to handle missing data used in the field of epidemiology, and to discuss the advantages and disadvantages of each of these methods, (ii) to propose a strategy of analysis in order to study the robustness of the results obtained via classical methods to handle missing data to the departure from hypotheses which are required for the validity of these results, although they are not testable, and (iii) to propose some applications on real data of the issues discussed in the first two sections
Hajji, Hicham. "Gestion des risques naturels : une approche fondée sur l'intégration des données." Lyon, INSA, 2005. http://theses.insa-lyon.fr/publication/2005ISAL0039/these.pdf.
There is a huge geographic data available with many organizations collecting geographic data for centuries, but some of that is still in the form of paper maps or in traditional files or databases, and with the emergence of latest technologies in the field of software and data storage some has been digitized and is stored in latest GIS systems. However, too often their reuse for new applications is a nightmare, due to diversity of data sets, heterogeneity of existing systems in terms of data modeling concepts, data encoding techniques, obscure semantics of data,storage structures, access functionality, etc. Such difficulties are more common in natural hazards information systems. In order to support advanced natural hazards management based on heterogeneous data, this thesis develops a new approach to the integration of semantically heterogeneous geographic information which is capable of addressing the spatial and thematic aspects of geographic information. The approach is based on OpenGIS standard. It uses it as a common model for data integration. The proposed methodology takes into consideration a large number of the aspects involved in the construction and the modelling of natural hazards management information system. Another issue has been addressed in this thesis, which is the design of an ontology for natural hazards. The ontology design has been extensively studied in recent years, we have tried throughout this work to propose an ontology to deal with semantic heterogeneity existing between different actors and to model existing knowledge present for this issue. The ontology contains the main concepts and relationships between these concepts using OWL Language
Mokadem, Riad. "Signatures algébriques dans la gestion de structures de données distribuées et scalables." Paris 9, 2006. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=2006PA090014.
Recent years saw emergence of new architectures, involving multiple computers. New concepts were proposed. Among most popular are those of a multicomputer or of a Network of Worksattion and more recently, of Peer to Peer and Grid Computing. This thesis consists on the design, implementation and performance measurements of a prototype SDDS manager, called SDDS-2005. It manages key based ordered files in distributed RAM of Windows machines forming a grid or P2P network. Our scheme can backup the RAM on each storage node onto the local disk. Our goal is to write only the data that has changed since the last backup. We interest also to update records and non key search (scans). Their common denominator was some application of the properties of new signature scheme based that we call algebraic signatures, which are useful in this context. Ones needs then to find only the areas that changed in the bucket since the last buckup. Our signature based scheme for updating records at the SDDS client should prove its advantages in client-server based database systems in general. It holds the promise of interesting possibilities for transactional concurrency control, beyond the mere avoidance of lost updates. We also update only data have been changed because of the using the algebraic signatures. Also, partly pre-computed algebraic signature of a string encodes each symbol by its cumulative signatures. They protect the SDDS data against incidental viewing by an unauthorized server’s administrator. The method appears attractive, it does not amply any storage overhead. It is also completly transparent for servers and occurs in client. Next, our cheme provide fast string search (match) directly on encoded data at the SDDS servers. They appear an alternative to known Karp-Rabin type schemes. Scans can explore the storage nodes in parallel. They match the records by entire non-key content or by its substring, prefix, longest common prefix or longest common string. The search complexity is almost O (1) for prefix search. One may use them also to detect and localize the silent corruption. These features should be of interest to P2P and grid computing. Then, we propose novel string search algorithm called n-Gramme search. It also appears then among the fastest known, e. G, probably often the faster one we know. It cost only a small fraction of existing records match, especially for larger strings search. The experiments prove high efficiency of our implementation. Our buckup scheme is substantially more efficient with the algebraic signatures. The signature calculus is itself substantially faster, the gain being about 30 %. Also, experiments prove that our cumulative pre-computing notably accelerates the string searchs which are faster than the partial one, at the expense of higher encoding/decoding overhead. They are new alternatives to known Karp-Rabin type schemes, and likely to be usually faster. The speed of string matches opens interesting perspectives for the popular join, group-by, rollup, and cube database operations. Our work has been subject of five publications in international conferences [LMS03, LMS05a, LMS05b, ML06, l&al06]. For convenience, we have included the latest publications. Also, the package termed SDDS-2005 is available for non-commercial use at http://ceria. Dauphine. Fr/. It builds up on earlier versions of the prototype, a cumulative effort of several folks and n-Gramme algorithm implementation. We have also presented our proposed prototype, SDDS-2005, at the Microsoft Research Academic Days 2006
El, Khalkhali Imad. "Système intégré pour la modélisation, l'échange et le partage des données de produits." Lyon, INSA, 2002. http://theses.insa-lyon.fr/publication/2002ISAL0052/these.pdf.
In Virtual Enterprise and Concurrent Engineering environments, a wide variety of information is used. A crucial issue is the data communication and exchange between heterogeneous systems and distant sites. To solve this problem, the STEP project was introduced. The STandard for the Exchange of Product model data STEP is an evolving international standard for the representation and exchange of product data. The objective of STEP is to provide the unambiguous computer-interpretable representation of product data in all phases of the product’s lifecycle. In a collaborative product development different types of experts in different disciplines are concerned by the product (Design, Manufacturing, Marketing, Customers,. . . ). Each of these experts has his own viewpoint about the same product. STEP Models are unable to represent the expert’s viewpoints. The objective of our research work is to propose a methodology for representation and integration of different expert’s viewpoints in design and manufacturing phases. An Information Infrastructure for modelling, exchanging and sharing product data models is also proposed
Bame, Ndiouma. "Gestion de donnée complexes pour la modélisation de niche écologique." Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066125.
This thesis concerns large scale biodiversity data management. Its objectives are to optimize queries for researchers who have free access to biodiversity worldwide data. These data which are shared by worldwide research laboratories are federated in GBIF data warehouse. GBIF makes accessible its data to researchers, policy makers and general public. With a significant amount of data and a rapid growth of data and users that express new needs, the GBIF portal is facing a double problem of expressiveness of queries and of efficiency. Thus, we propose a decentralized solution for biodiversity data interrogation. Our solution combines the resources of several of remote and limited machines to provide the needed computing and storage power to ensure system responsiveness for users. It also provides high-level query interface which is more expressive for users. Then, we propose a dynamic data distribution on demand approach. This approach which is based on data properties and characteristics of users analysis queries adapts dynamically machines capacities to users demands. Then, we propose a queries optimization approach that adapts dynamically data placement and machines loads according to performances in order to process users queries within deadlines. We experimentally validated our solution with real GBIF data concerning 100 million observation data