Dissertations / Theses on the topic 'Entrepôt de données de santé'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Entrepôt de données de santé.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Khnaisser, Christina. "Méthode de construction d'entrepôt de données temporalisé pour un système informationnel de santé." Mémoire, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/8386.
Full textKempf, Emmanuelle. "Structuration, standardisation et enrichissement par traitement automatique du langage des données relatives au cancer au sein de l’entrepôt de données de santé de l’Assistance Publique – Hôpitaux de Paris." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS694.
Full textCancer is a public health issue for which the improvement of care relies, among other levers, on the use of clinical data warehouses (CDWs). Their use involves overcoming obstacles such as the quality, standardization and structuring of the care data stored there. The objective of this thesis was to demonstrate that it is possible to address the challenges of secondary use of data from the Assistance Publique - Hôpitaux de Paris (AP-HP) CDW regarding cancer patients, and for various purposes such as monitoring the safety and quality of care, and performing observational and experimental clinical research. First, the identification of a minimal data set enabled to concentrate the effort of formalizing the items of interest specific to the discipline. From 15 identified items, 4 use cases from distinct medical perspectives were successfully developed: automation of calculations of safety and quality of care required for the international certification of health establishments , clinical epidemiology regarding the impact of public health measures during a pandemic on the delay in cancer diagnosis, decision support regarding the optimization of patient recruitment in clinical trials, development of neural networks regarding prognostication by computer vision. A second condition necessary for the CDW use in oncology is based on the optimal and interoperable formalization between several CDWs of this minimal data set. As part of the French PENELOPE initiative aiming at improving patient recruitment in clinical trials, the thesis assessed the added value of the oncology extension of the OMOP common data model. This version 5.4 of OMOP enabled to double the rate of formalization of prescreening criteria for phase I to IV clinical trials. Only 23% of these criteria could be automatically queried on the AP-HP CDW, and this, modulo a positive predictive value of less than 30%. This work suggested a novel methodology for evaluating the performance of a recruitment support system: based on the usual metrics (sensitivity, specificity, positive predictive value, negative predictive value), but also based on additional indicators characterizing the adequacy of the model chosen with the CDW related (rate of translation and execution of queries). Finally, the work showed how natural language processing related to the CDW data structuring could enrich the minimal data set, based on the baseline tumor dissemination assessment of a cancer diagnosis and on the histoprognostic characteristics of tumors. The comparison of textual extraction performance metrics and the human and technical resources necessary for the development of rules and machine learning systems made it possible to promote, for a certain number of situations, the first approach. The thesis identified that automatic rule-based preannotation before a manual annotation phase for training a machine learning model was an optimizable approach. The rules seemed to be sufficient for textual extraction tasks of a certain typology of entities that are well characterized on a lexical and semantic level. Anticipation and modeling of this typology could be possible upstream of the textual extraction phase, in order to differentiate, depending on each type of entity, to what extent machine learning should replace the rules. The thesis demonstrated that a close attention to a certain number of data science challenges allowed the efficient use of a CDW for various purposes in oncology
Lamer, Antoine. "Contribution à la prévention des risques liés à l’anesthésie par la valorisation des informations hospitalières au sein d’un entrepôt de données." Thesis, Lille 2, 2015. http://www.theses.fr/2015LIL2S021/document.
Full textIntroduction Hospital Information Systems (HIS) manage and register every day millions of data related to patient care: biological results, vital signs, drugs administrations, care process... These data are stored by operational applications provide remote access and a comprehensive picture of Electronic Health Record. These data may also be used to answer to others purposes as clinical research or public health, particularly when integrated in a data warehouse. Some studies highlighted a statistical link between the compliance of quality indicators related to anesthesia procedure and patient outcome during the hospital stay. In the University Hospital of Lille, the quality indicators, as well as the patient comorbidities during the post-operative period could be assessed with data collected by applications of the HIS. The main objective of the work is to integrate data collected by operational applications in order to realize clinical research studies.Methods First, the data quality of information registered by the operational applications is evaluated with methods … by the literature or developed in this work. Then, data quality problems highlighted by the evaluation are managed during the integration step of the ETL process. New data are computed and aggregated in order to dispose of indicators of quality of care. Finally, two studies bring out the usability of the system.Results Pertinent data from the HIS have been integrated in an anesthesia data warehouse. This system stores data about the hospital stay and interventions (drug administrations, vital signs …) since 2010. Aggregated data have been developed and used in two clinical research studies. The first study highlighted statistical link between the induction and patient outcome. The second study evaluated the compliance of quality indicators of ventilation and the impact on comorbity.Discussion The data warehouse and the cleaning and integration methods developed as part of this work allow performing statistical analysis on more than 200 000 interventions. This system can be implemented with other applications used in the CHRU of Lille but also with Anesthesia Information Management Systems used by other hospitals
Bouba, Fanta. "Système d'information décisionnel sur les interactions environnement-santé : cas de la Fièvre de la Vallée du Rift au Ferlo (Sénégal)." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066461/document.
Full textOur research is in part of the QWeCI european project (Quantifying Weather and Climate Impacts on Health in Developing Countries, EU FP7) in partnership with UCAD, the CSE and the IPD, around the theme of environmental health with the practical case on vector-borne diseases in Senegal and particularly the Valley Fever (RVF). The health of human and animal populations is often strongly influenced by the environment. Moreover, research on spread factors of vector-borne diseases such as RVF, considers this issue in its dimension both physical and socio-economic. Appeared in 1912-1913 in Kenya, RVF is a widespread viral anthropo-zoonosis in tropical regions which concerns animals but men can also be affected. In Senegal, the risk area concerns mainly the Senegal River Valley and the forestry-pastoral areas Ferlo. With a Sahelian climate, the Ferlo has several ponds that are sources of water supply for humans and livestock but also breeding sites for potential vectors of RVF. The controlling of the RVF, which is crossroads of three (03) large systems (agro-ecological, pathogen, economic/health/social), necessarily entails consideration of several parameters if one wants to first understand the mechanisms emergence but also consider the work on risk modeling. Our work focuses on the decision making process for quantify the use of health data and environmental data in the impact assessment for the monitoring of RVF. Research teams involved produce data during their investigations periods and laboratory analyzes. The growing flood of data should be stored and prepared for correlated studies with new storage techniques such as datawarehouses. About the data analysis, it is not enough to rely only on conventional techniques such as statistics. Indeed, the contribution on the issue is moving towards a predictive analysis combining both aggregate storage techniques and processing tools. Thus, to discover information, it is necessary to move towards datamining. Furthermore, the evolution of the disease is strongly linked to environmental spatio-temporal dynamics of different actors (vectors, viruses, and hosts), cause for which we rely on spatio-temporal patterns to identify and measure interactions between environmental parameters and the actors involved. With the decision-making process, we have obtained many results :i. following the formalization of multidimensional modeling, we have built an integrated datawarehouse that includes all the objects that are involved in managing the health risk - this model can be generalized to others vector-borne diseases;ii. despite a very wide variety of mosquitoes, Culex neavei, Aedes ochraceus and Aedes vexans are potential vectors of FVR. They are most present in the study area and, during the rainy season period which is most prone to suspected cases; the risk period still remains the month of October;iii. the analyzed ponds have almost the same behavior, but significant variations exist in some points.This research shows once again the interest in the discovery of relationships between environmental data and the FVR with datamining methods for the spatio-temporal monitoring of the risk of emergence
Bouba, Fanta. "Système d'information décisionnel sur les interactions environnement-santé : cas de la Fièvre de la Vallée du Rift au Ferlo (Sénégal)." Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066461.
Full textOur research is in part of the QWeCI european project (Quantifying Weather and Climate Impacts on Health in Developing Countries, EU FP7) in partnership with UCAD, the CSE and the IPD, around the theme of environmental health with the practical case on vector-borne diseases in Senegal and particularly the Valley Fever (RVF). The health of human and animal populations is often strongly influenced by the environment. Moreover, research on spread factors of vector-borne diseases such as RVF, considers this issue in its dimension both physical and socio-economic. Appeared in 1912-1913 in Kenya, RVF is a widespread viral anthropo-zoonosis in tropical regions which concerns animals but men can also be affected. In Senegal, the risk area concerns mainly the Senegal River Valley and the forestry-pastoral areas Ferlo. With a Sahelian climate, the Ferlo has several ponds that are sources of water supply for humans and livestock but also breeding sites for potential vectors of RVF. The controlling of the RVF, which is crossroads of three (03) large systems (agro-ecological, pathogen, economic/health/social), necessarily entails consideration of several parameters if one wants to first understand the mechanisms emergence but also consider the work on risk modeling. Our work focuses on the decision making process for quantify the use of health data and environmental data in the impact assessment for the monitoring of RVF. Research teams involved produce data during their investigations periods and laboratory analyzes. The growing flood of data should be stored and prepared for correlated studies with new storage techniques such as datawarehouses. About the data analysis, it is not enough to rely only on conventional techniques such as statistics. Indeed, the contribution on the issue is moving towards a predictive analysis combining both aggregate storage techniques and processing tools. Thus, to discover information, it is necessary to move towards datamining. Furthermore, the evolution of the disease is strongly linked to environmental spatio-temporal dynamics of different actors (vectors, viruses, and hosts), cause for which we rely on spatio-temporal patterns to identify and measure interactions between environmental parameters and the actors involved. With the decision-making process, we have obtained many results :i. following the formalization of multidimensional modeling, we have built an integrated datawarehouse that includes all the objects that are involved in managing the health risk - this model can be generalized to others vector-borne diseases;ii. despite a very wide variety of mosquitoes, Culex neavei, Aedes ochraceus and Aedes vexans are potential vectors of FVR. They are most present in the study area and, during the rainy season period which is most prone to suspected cases; the risk period still remains the month of October;iii. the analyzed ponds have almost the same behavior, but significant variations exist in some points.This research shows once again the interest in the discovery of relationships between environmental data and the FVR with datamining methods for the spatio-temporal monitoring of the risk of emergence
Bottani, Simona. "Machine learning for neuroimaging using a very large scale clinical datawarehouse." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS110.
Full textMachine learning (ML) and deep learning (DL) have been widely used for the computer-aided diagnosis (CAD) of neurodegenerative diseases The main limitation of these tools is that they have been mostly validated using research data sets that are very different from clinical routine ones. Clinical data warehouses (CDW) allow access to such clinical data.This PhD work consisted in applying ML/DL algorithms to data originating from the CDW of the Greater Paris area to validate CAD of neurodegenerative diseases.We developed, thanks to the manual annotation of 5500 images, an automatic approach for the quality control (QC) of T1-weighted (T1w) brain magnetic resonance images (MRI) from a clinical data set. QC is fundamental as insufficient image quality can prevent CAD systems from working properly. In the second work, we focused on the homogenization of T1w brain MRIs from a CDW. We proposed to homogenize such large clinical data set by converting images acquired after the injection of gadolinium into non-contrast-enhanced images. Lastly, we assessed whether ML/DL algorithms could detect dementia in a CDW using T1w brain MRI. We identified the population of interest using ICD-10 codes. We studied how the imbalance of the training sets may bias the results and we proposed strategies to attenuate these biases
Loizillon, Sophie. "Deep learning for automatic quality control and computer-aided diagnosis in neuroimaging using a large-scale clinical data warehouse." Electronic Thesis or Diss., Sorbonne université, 2024. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2024SORUS258.pdf.
Full textPatient's hospitalisation generates data about their health, which is essential to ensure that they receive the best possible care. Over the last decade, clinical data warehouses (CDWs) have been created to exploit this vast amount of clinical information for research purposes. CDWs offer remarkable potential for research by bringing together a huge amount of real-world data of diverse nature (electronic health records, imaging data, pathology and laboratory tests...) from up to millions of patients. Access to such large clinical routine datasets, which are an excellent representation of what is acquired daily in clinical practice, is a major advantage in the development and deployment of powerful artificial intelligence models in clinical routine. Currently, most computer-aided diagnosis models are limited by a training performed only on research datasets with patients meeting strict inclusion criteria and data acquired under highly standardised research protocols, which differ considerably from the realities of clinical practice. This gap between research and clinical data is leading to the failure of AI systems to be well generalised in clinical practice.This thesis examined how to leverage clinical data warehouse brain MRI data for research purposes.Because images gathered in CDW are highly heterogeneous, especially regarding their quality, we first focused on developing an automated solution capable of effectively identifying corrupted images in CDWs. We improved the initial automated 3D T1 weighted brain MRI quality control developed by (Bottani et al. 2021) by proposing an innovative transfer learning method, leveraging artefact simulation.In the second work, we extended our automatic quality control for T1-weighted MRI to another common anatomical sequence: 3D FLAIR. As machine learning models are sensitive to distribution shifts, we proposed a semi-supervised domain adaptation framework. Our automatic quality control tool was able to identify images that are not proper 3D FLAIR brain MRIs and assess the overall image quality with a limited number of new manual annotation of FLAIR images. Lastly, we conducted a feasibility study to assess the potential of variational autoencoders for unsupervised anomaly detection. We obtained promising results showing a correlation between Fazekas scores and volumes of lesions segmented by our model, as well as the robustness of the method to image quality. Nevertheless, we still observed failure cases where no lesion is detected at all in lesional cases, which prevents this type of model to be used in clinical routine for now.Although clinical data warehouses are an incredible research ecosystem, to enable a better understanding of the health of the general population and, in the long term, contributing to the development of predictive and preventive medicine, their use for research purposes is not without its difficulties
Dony, Philippe. "CREATION D’UN ENTREPOT DE DONNEES EN ANESTHESIE: POTENTIEL POUR LA GESTION ET LA SANTE PUBLIQUE." Doctoral thesis, Universite Libre de Bruxelles, 2018. https://dipot.ulb.ac.be/dspace/bitstream/2013/279599/3/TM.pdf.
Full textDoctorat en Santé Publique
info:eu-repo/semantics/nonPublished
Nguyen, Benjamin. "Construction et évolution d'un entrepôt de données sur la toile." Paris 11, 2003. http://www.theses.fr/2003PA112283.
Full textOur work is to be placed in the general context of the creation of a framework in order to discover, analyse, process, store, integrate and query information found on the Web. We begin with a review of the state of the art concerning the following problems: querying information on the Web, managing the evolution of a warehouse, and document clustering techniques. In this thesis, we study the construction and evolution of a Web Warehouse. We propose on the one hand a methodology for conceiving such a warehouse, and on the other, we study the functionalities it should posess. We present the results of two experiments in which we took part, Xyleme and Thesus. The goal of the Xyleme Project was to manage all the XML pages of the Web, from crawling and fetching to querying. We detail in this work the monitoring of the pages, their temporal evolution. The goal of the Thesus Project was to create thematic collections of Web pages, based on the analysis of the page's semantics, using various tools, including link analysis and clustering algorithms. Both projects have been implemented, and our monitoring module is used in industry by the Xyleme S. A. Company. These two experiments provided a general framework for deeper reflection on how to conceive a thematic warehouse, which is detailled and illustrated by the SPIN prototype
Mathieu, Jean. "Intégration de données temps-réel issues de capteurs dans un entrepôt de données géo-décisionnel." Thesis, Université Laval, 2011. http://www.theses.ulaval.ca/2011/28019/28019.pdf.
Full textIn the last decade, the use of sensors for measuring various phenomenons has greatly increased. As such, we can now make use of sensors to measure GPS position, temperature and even the heartbeats of a person. Nowadays, the wide diversity of sensor makes them the best tools to gather data. Along with this effervescence, analysis tools have also advanced since the creation of transactional databases, leading to a new category of tools, analysis systems (Business Intelligence (BI)), which respond to the need of the global analysis of the data. Data warehouses and OLAP (On-Line Analytical Processing) tools, which belong to this category, enable users to analyze big volumes of data, execute time-based requests and build statistic graphs in a few simple mouse clicks. Although the various types of sensor can surely enrich any analysis, such data requires heavy integration processes to be driven into the data warehouse, centerpiece of any decision-making process. The different data types produced by sensors, sensor models and ways to transfer such data are even today significant obstacles to sensors data streams integration in a geo-decisional data warehouse. Also, actual geo-decisional data warehouses are not initially built to welcome new data on a high frequency. Since the performances of a data warehouse are restricted during an update, new data is usually added weekly, monthly, etc. However, some data warehouses, called Real-Time Data Warehouses (RTDW), are able to be updated several times a day without letting its performance diminish during the process. But this technology is not very common, very costly and in most of cases considered as "beta" versions. Therefore, this research aims to develop an approach allowing to publish and normalize real-time sensors data streams and to integrate it into a classic data warehouse. An optimized update strategy has also been developed so the frequent new data can be added to the analysis without affecting the data warehouse performances.
Atigui, Faten. "Approche dirigée par les modèles pour l’implantation et la réduction d’entrepôts de données." Thesis, Toulouse 1, 2013. http://www.theses.fr/2013TOU10044/document.
Full textOur work handles decision support systems based on multidimensional Data Warehouse (DW). A Data Warehouse (DW) is a huge amount of data, often historical, used for complex and sophisticated analysis. It supports the business process within an organization. The relevant data for the decision-making process are collected from data sources by means of software processes commonly known as ETL (Extraction-Transformation-Loading) processes. The study of existing systems and methods shows two major limits. Actually, when building a DW, the designer deals with two major issues. The first issue treats the DW's design, whereas the second addresses the ETL processes design. Current frameworks provide partial solutions that focus either on the multidimensional structure or on the ETL processes, yet both could benefit from each other. However, few studies have considered these issues in a unified framework and have provided solutions to automate all of these tasks. Since its creation, the DW has a large amount of data, mainly due to the historical data. Looking into the decision maker's analysis over time, we can see that they are usually less interested in old data.To overcome these shortcomings, this thesis aims to formalize the development of a time-varying (with a temporal dimension) DW from its design to its physical implementation. We use the Model Driven Engineering (MDE) that automates the process and thus significantly reduce development costs and improve the software quality. The contributions of this thesis are summarized as follows: 1. To formalize and to automate the development of a time-varying DW within a model-driven approach that provides: - A set of unified (conceptual, logical and physical) metamodels that describe data and transformation operations. - An OCL (Object Constraint Language) extension that aims to conceptually formalize the transformation operations. - A set of transformation rules that maps the conceptual model to logical and physical models. - A set of transformation rules that generates the code. 2. To formalize and to automate historical data reduction within a model-driven approach that provides : - A set of (conceptual, logical and physical) metamodels that describe the reduced data. - A set of reduction operations. - A set of transformation rules that implement these operations at the physical level.In order to validate our proposals, we have developed a prototype composed of three parts. The first part performs the transformation of models to lower level models. The second part transforms the physical model into code. The last part allows the DW reduction
Aknouche, Rachid. "Entrepôt de textes : de l'intégration à la modélisation multidimensionnelle de données textuelles." Thesis, Lyon 2, 2014. http://www.theses.fr/2014LYO20025.
Full textThe work, presented in this thesis, aims to propose solutions to the problems of textual data warehousing. The interest in the textual data is motivated by the fact that they cannot be integrated and warehoused by using the traditional applications and the current techniques of decision-making systems. In order to overcome this problem, we proposed a text warehouses approach which covers the main phases of a data warehousing process adapted to textual data. We focused specifically on the integration of textual data and their multidimensional modeling. For the textual data integration, we used information retrieval (IR) techniques and automatic natural language processing (NLP). Thus, we proposed an integration framework, called ETL-Text which is an ETL (Extract- Transform- Load) process suitable for textual data. The ETL-Text performs the extracting, filtering and transforming tasks of the original textual data in a form allowing them to be warehoused. Some of these tasks are performed in our RICSH approach (Contextual information retrieval by topics segmentation of documents) for pretreatment and textual data search. On the other hand, the organization of textual data for the analysis is carried out by our proposed TWM (Text Warehouse Modelling). It is a new multidimensional model suitable for textual data. It extends the classical constellation model to support the representation of textual data in a multidimensional environment. TWM includes a semantic dimension defined for structuring documents and topics by organizing the semantic concepts into a hierarchy. Also, we depend on a Wikipedia, as an external semantic source, to achieve the semantic part of the model. Furthermore, we developed WikiCat, which is a tool permit to feed the TWM semantic dimension with semantics descriptors from Wikipedia. These last two contributions complement the ETL-Text framework to establish the text warehouse device. To validate the different contributions, we performed, besides the implementation works, an experimental study for each model. For the emergence of large data, we developed, as part of a case study, a parallel processing algorithms using the MapReduce paradigm tested in the Apache Hadoop environment
Sautot, Lucile. "Conception et implémentation semi-automatique des entrepôts de données : application aux données écologiques." Thesis, Dijon, 2015. http://www.theses.fr/2015DIJOS055/document.
Full textThis thesis concerns the semi-automatic design of data warehouses and the associated OLAP cubes analyzing ecological data.The biological sciences, including ecology and agronomy, generate data that require an important collection effort: several years are often required to obtain a complete data set. Moreover, objects and phenomena studied by these sciences are complex and require many parameter recording to be understood. Finally, the collection of complex data over a long time results in an increased risk of inconsistency. Thus, these sciences generate numerous and heterogeneous data, which can be inconsistent. It is interesting to offer to scientists, who work in life sciences, information systems able to store and restore their data, particularly when those data have a significant volume. Among the existing tools, business intelligence tools, including online analytical systems (On-Line Analytical processing: OLAP), particularly caught our attention because it is data analysis process working on large historical collections (i.e. a data warehouse) to provide support to the decision making. The business intelligence offers tools that allow users to explore large volumes of data, in order to discover patterns and knowledge within the data, and possibly confirm their hypotheses.However, OLAP systems are complex information systems whose implementation requires advanced skills in business intelligence. Thus, although they have interesting features to manage and analyze multidimensional data, their complexity makes them difficult to manage by potential users, who would not be computer scientists.In the literature, several studies have examined the automatic multidimensional design, but the examples provided by theses works were traditional data. Moreover, other articles address the multidimensional modeling adapted to complex data (inconsistency, heterogeneous data, spatial objects, texts, images within a warehouse ...) but the proposed methods are rarely automatic. The aim of this thesis is to provide an automatic design method of data warehouse and OLAP cubes. This method must be able to take into account the inherent complexity of biological data. To test the prototypes, that we proposed in this thesis, we have prepared a data set concerning bird abundance along the Loire. This data set is structured as follows: (1) we have the census of 213 bird species (described with a set of qualitative factors, such as diet) in 198 points along the river for 4 census campaigns; (2) each of the 198 points is described by a set of environmental variables from different sources (land surveys, satellite images, GIS). These environmental variables address the most important issue in terms of multidimensional modeling. These data come from different sources, sometimes independent of bird census campaigns, and are inconsistent in time and space. Moreover, these data are heterogeneous: they can be qualitative factors, quantitative varaibles or spatial objects. Finally, these environmental data include a large number of attributes (158 selected variables) (...)
El, Sarraj Lama. "Exploitation d'un entrepôt de données guidée par des ontologies : application au management hospitalier." Thesis, Aix-Marseille, 2014. http://www.theses.fr/2014AIXM4331.
Full textThis research is situated in the domain of Data Warehouses (DW) personalization and concerns DW assistance. Specifically, we are interested in assisting a user during an online analysis processes to use existing operational resources. The application of this research concerns hospital management, for hospitals governance, and is limited to the scope of the Program of Medicalization of Information Systems (PMSI). This research was supported by the Public Hospitals of Marseille (APHM). Our proposal is a semantic approach based on ontologies. The support system implementing this approach, called Ontology-based Personalization System (OPS), is based on a knowledge base operated by a personalization engine. The knowledge base is composed of three ontologies: a domain ontology, an ontology of the DW structure, and an ontology of resources. The personalization engine allows firstly, a personalized search of resources of the DW based on users profile, and secondly for a particular resource, an expansion of the research by recommending new resources based on the context of the resource. To recommend new resources, we have proposed three possible strategies. To validate our proposal, a prototype of the OPS system was developed, a personalization engine has been implemented in Java. This engine exploit an OWL knowledge composed of three interconnected OWL ontologies. We illustrate three experimental scenarios related to PMSI and defined with APHM domain experts
Dugré, Mathieu. "Conception et réalisation d'un entrepôt de données : intégration à un système existant et étape nécessaire vers le forage de données." Thèse, Université du Québec à Trois-Rivières, 2004. http://depot-e.uqtr.ca/4679/1/000108834.pdf.
Full textCavalier, Mathilde. "La propriété des données de santé." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE3071/document.
Full textThe question of the protection and enhancement of health data is subject to a permanent renewal because it appears to be in the middle of some conflicting interests. Legal, health and economic logics confront and express themselves through a particularly heterogenous set of regulations on health data. Property rights here seem able to reconcile these issues that first look contradictory appearance issues. Given the place of this right in our legal system and uniqueness of health data, the study of their reconciliation deserves a study of some magnitude. This is a first step to ensure the compatibility of this law with health data. The answer requires a vision of simplified property only to find that the existing rights of the data is already in the property rights but which, because of the particularity of health data, are largely limited. Secondly, therefore the question of the relevance of the application of "more complete" property rights applies to health data. However, we note that the specificity of health data implies that such a the solution is not the most effective for achieving a fair balance between patients and data collectors. Nevertheless, other solutions are possible
Mbarki, Mohamed. "Gestion de l'hétérogénéité documentaire : le cas d'un entrepôt de documents multimédia." Toulouse 3, 2008. http://thesesups.ups-tlse.fr/185/.
Full textThe knowledge society is based on three axes: the diffusion and use of information via new technologies, the deduction of knowledge induced by this information and the economic impacts which can result from this information. To offer to the actors and more particularly to the "decision makers" of this society some tools which enable them to produce and manage "knowledge" or at least "elements of knowledge" seem to be rather difficult to ensure. This difficulty is due to the dynamism of the environment and the diversity of factors influencing the information production, extraction and communication. Indeed, this information is included in documents which are collected from disseminated sources (Internet, Workflow, numerical libraries, etc. ). These documents are thus heterogeneous on the content and on the form (they can be related to various fields, they can be more or less structured, they can have various structures, they contain several type of media, are stored in several type of supports, etc). The current challenges are to conceive new applications to exploit this document heterogeneity. Having in mind these needs, the work presented in my thesis, aims to face these challenges and in particular at proposing solutions in order "to manage and create knowledge" starting from the integration of all information available on the heterogeneous documents. The handling of multimedia documents repositories constitutes the applicative framework of our proposals. Our approach is articulated around three complementary axes: (1) the representation, (2) storage (or integration) and (3) exploitation of the heterogeneous documents. Documents representation is related to the determination of information that must be preserved and the way according to which they must be organized to offer better apprehending and envisaging of their uses. The solution that we chose to meet these needs bases on the proposal for a documents model which integrates several overlapping and complementary levels of description (a generic layer and a specific one, a logical description and a semantic one). .
Royer, Kevin. "Vers un entrepôt de données et des processus : le cas de la mobilité électrique chez EDF." Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2015. http://www.theses.fr/2015ESMA0001/document.
Full textNowadays, the electrical vehicles (EV) market is undergoing a rapid expansion and has become ofgreat importance for utility companies such as EDF. In order to fulfill its objectives (demand optimization,pricing, etc.), EDF has to extract and analyze heterogeneous data from EV and charging spots. Inorder to tackle this, we used data warehousing (DW) technology serving as a basis for business process(BP). To avoid the garbage in/garbage out phenomena, data had to be formatted and standardized.We have chosen to rely on an ontology in order to deal with data sources heterogeneity. Because theconstruction of an ontology can be a slow process, we proposed an modular and incremental constructionof the ontology based on bricks. We based our DW on the ontology which makes its construction alsoan incremental process. To upload data to this particular DW, we defined the ETL (Extract, Trasform& Load) process at the semantic level. We then designed recurrent BP with BPMN (Business ProcessModelization & Notation) specifications to extract EDF required knowledge. The assembled DWpossesses data and BP that are both described in a semantic context. We implemented our solutionon the OntoDB platform, developed at the ISAE-ENSMA Laboratory of Computer Science and AutomaticControl for Systems. The solution has allowed us to homogeneously manipulate the ontology, thedata and the BP through the OntoQL language. Furthermore, we added to the proposed platform thecapacity to automatically execute any BP described with BPMN. Ultimately, we were able to provideEDF with a tailor made platform based on declarative elements adapted to their needs
Najjar, Ahmed. "Forage de données de bases administratives en santé." Doctoral thesis, Université Laval, 2017. http://hdl.handle.net/20.500.11794/28162.
Full textCurrent health systems are increasingly equipped with data collection and storage systems. Therefore, a huge amount of data is stored in medical databases. Databases, designed for administrative or billing purposes, are fed with new data whenever the patient uses the healthcare system. This specificity makes these databases a rich source of information and extremely interesting. These databases can unveil the constraints of reality, capturing elements from a great variety of real medical care situations. So, they could allow the conception and modeling the medical treatment process. However, despite the obvious interest of these administrative databases, they are still underexploited by researchers. In this thesis, we propose a new approach of the mining for administrative data to detect patterns from patient care trajectories. Firstly, we have proposed an algorithm able to cluster complex objects that represent medical services. These objects are characterized by a mixture of numerical, categorical and multivalued categorical variables. We thus propose to extract one projection space for each multivalued variable and to modify the computation of the distance between the objects to consider these projections. Secondly, a two-step mixture model is proposed to cluster these objects. This model uses the Gaussian distribution for the numerical variables, multinomial for the categorical variables and the hidden Markov models (HMM) for the multivalued variables. Finally, we obtain two algorithms able to cluster complex objects characterized by a mixture of variables. Once this stage is reached, an approach for the discovery of patterns of care trajectories is set up. This approach involves the followed steps: 1. preprocessing that allows the building and generation of medical services sets. Thus, three sets of medical services are obtained: one for hospital stays, one for consultations and one for visits. 2. modeling of treatment processes as a succession of labels of medical services. These complex processes require a sophisticated method of clustering. Thus, we propose a clustering algorithm based on the HMM. 3. creating an approach of visualization and analysis of the trajectory patterns to mine the discovered models. All these steps produce the knowledge discovery process from medical administrative databases. We apply this approach to databases for elderly patients over 65 years old who live in the province of Quebec and are suffering from heart failure. The data are extracted from the three databases: the MSSS MED-ÉCHO database, the RAMQ bank and the database containing death certificate data. The obtained results clearly demonstrated the effectiveness of our approach by detecting special patterns that can help healthcare administrators to better manage health treatments.
Pinilla, Erwan. "Données de santé, dynamiques et enjeux de souveraineté." Electronic Thesis or Diss., Strasbourg, 2023. http://www.theses.fr/2023STRAA015.
Full textAim of this research is to identify the dynamics of “health data” in the field of digital sovereignty: who can use it to describe and explain situations, predict trends, and induce individual and/or population, or even States, behaviours ? What is – and should be legally protected, and how ? We here report on and analyze the overflowing of historical approaches to regulation, due to the diversification of players, techniques and uses ; the multiplication of data sources and their dissemination, the shaking of legal categories despite their recent establishment ; the porosity of national and joint systems, due to conventional or agressive interactions. As a result, we analyze the accelerated advent of new rules at European level in traditionally regalian fields of cyber infrastructure, qualifications (data, technologies, uses), and mutual guarantees against interferences. Other challenges call for in-depth insight (such as reidentification & synthetic data), in an era where for long technological domination is no more a prerogative of States, and where geopolitics has been extended by new tools and practices
Zorn, Caroline. "Données de santé et secret partagé : pour un droit de la personne à la protection de ses données de santé partagées." Thesis, Nancy 2, 2009. http://www.theses.fr/2009NAN20011.
Full textThe medical professional secret is a legal exception to the professional secret; it allows a patient's caregivers to exchange health information that is relevant to that patient's care without being punished for revealing confidential information. That caregivers discuss patient's health information with other medical professional involved in that patient's care is to the benefit of the patient. Nonetheless, there is a fine balance to be struck between a "need to know" professional exchange of information, which is essential to care of the patient, and a broad exchange of information, which may ultimately comprise the confidentiality of the patient's private life. The emergence of an electronic tool, which multiplies the potential possibilities for data exchange, further disrupts this balance. Consequently, the manipulation of this shared health information must be subject to the medical professional secret, the "Informatique et Libertés" legislation, and all of the numerous norms and standards as defined by the French national electronic medical record (DMP), the pharmaceutical medical record (Dossier pharmaceutique), or the reimbursement repository (Historique des remboursements). As the patient's health information is increasingly shared between health care providers - through means such as the DMP or DP - the patient's right and ability to control the access to his/her health information have to become more and more important. A study regarding the importance of obtaining the patient's consent lead to the following proposal: to inscribe in the French Constitution the patient's right to confidentiality regarding health information
Denis, Marie-Chantal. "Conception et réalisation d'un entrepôt de données institutionnel dans une perspective de support à la prise de décision." Thèse, Université du Québec à Trois-Rivières, 2008. http://depot-e.uqtr.ca/1267/1/030077904.pdf.
Full textLelong, Romain. "Accès sémantique aux données massives et hétérogènes en santé." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMR030/document.
Full textClinical data are produced as part of the practice of medicine by different health professionals, in several places and in various formats. They therefore present an heterogeneity both in terms of their nature and structure and are furthermore of a particularly large volume, which make them considered as Big Data. The work carried out in this thesis aims at proposing an effective information retrieval method within the context of this type of complex and massive data. First, the access to clinical data constrained by the need to model clinical information. This can be done within Electronic Health Records and, in a larger extent, within data Warehouses. In this thesis, I proposed a proof of concept of a search engine allowing the access to the information contained in the Semantic Health Data Warehouse of the Rouen University Hospital. A generic data model allows this data warehouse to view information as a graph of data, thus enabling to model the information while preserving its conceptual complexity. In order to provide search functionalities adapted to this generic representation of data, a query language allowing access to clinical information through the various entities of which it is composed has been developed and implemented as a part of this thesis’s work. Second, the massiveness of clinical data is also a major technical challenge that hinders the implementation of an efficient information retrieval. The initial implementation of the proof of concept highlighted the limits of a relational database management systems when used in the context of clinical data. A migration to a NoSQL key-value store has been then completed. Although offering good atomic data access performance, this migration nevertheless required additional developments and the design of a suitable hardware and applicative architecture toprovide advanced search functionalities. Finally, the contribution of this work within the general context of the Semantic Health Data Warehouse of the Rouen University Hospital was evaluated. The proof of concept proposed in this work was used to access semantic descriptions of information in order to meet the criteria for including and excluding patients in clinical studies. In this evaluation, a total or partial response is given to 72.97% of the criteria. In addition, the genericity of the tool has also made it possible to use it in other contexts such as documentary and bibliographic information retrieval in health
Michel, Franck. "Intégrer des sources de données hétérogènes dans le Web de données." Thesis, Université Côte d'Azur (ComUE), 2017. http://www.theses.fr/2017AZUR4002/document.
Full textTo a great extent, the success of the Web of Data depends on the ability to reach out legacy data locked in silos inaccessible from the web. In the last 15 years, various works have tackled the problem of exposing various structured data in the Resource Description Format (RDF). Meanwhile, the overwhelming success of NoSQL databases has made the database landscape more diverse than ever. NoSQL databases are strong potential contributors of valuable linked open data. Hence, the object of this thesis is to enable RDF-based data integration over heterogeneous data sources and, in particular, to harness NoSQL databases to populate the Web of Data. We propose a generic mapping language, xR2RML, to describe the mapping of heterogeneous data sources into an arbitrary RDF representation. xR2RML relies on and extends previous works on the translation of RDBs, CSV/TSV and XML into RDF. With such an xR2RML mapping, we propose either to materialize RDF data or to dynamically evaluate SPARQL queries on the native database. In the latter, we follow a two-step approach. The first step performs the translation of a SPARQL query into a pivot abstract query based on the xR2RML mapping of the target database to RDF. In the second step, the abstract query is translated into a concrete query, taking into account the specificities of the database query language. Great care is taken of the query optimization opportunities, both at the abstract and the concrete levels. To demonstrate the effectiveness of our approach, we have developed a prototype implementation for MongoDB, the popular NoSQL document store. We have validated the method using a real-life use case in Digital Humanities
Galbaud, du Fort Guillaume. "Epidémiologie et santé mentale du couple : etude comparée de données populationnelles et de données cliniques." Thesis, McGill University, 1991. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=59993.
Full textThe primary results from the study of 845 couples in the general population suggest that there exists a significant spouse-similarity across the various mental health dimensions examined (psychological distress, general well-being, and role satisfaction).
The main results from the study of 17 couples in marital therapy suggest that significant sex differences exist in dyadic adjustment. Sex differences were also noted in the correlations between dyadic adjustment and depressive symptoms.
In conclusion, it appears that epidemiological research on the mental health of couples should have as its objective a simultaneous consideration of both the individual and the couple, as well as a simultaneous consideration of clinical and general populations, in order to create a double complementarity out of this apparent double dichotomy.
Lechevalier, Fabien. "Les fiducies de données personnelles de santé : étude illustrée des enjeux et bénéfices d’une gestion collective de la propriété des données personnelles de santé." Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67590.
Full textLétourneau, François. "Analyse du potentiel de l'approche entrepôt de données pour l'intégration des métadonnées provenant d'un ensemble de géorépertoires disponibles sur Internet." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape17/PQDD_0007/MQ31752.pdf.
Full textBabilliot, Alain. "Typologie critique des méthodes informatiques pour l'analyse des données en épidémiologie." Paris 9, 1988. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1988PA090033.
Full textBuekens, Pierre. "Utilisation des bases de données pour l'évaluation de l'efficacité des interventions obstétricales." Doctoral thesis, Universite Libre de Bruxelles, 1988. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/213404.
Full textMegdiche, Bousarsar Imen. "Intégration holistique et entreposage automatique des données ouvertes." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30214/document.
Full textStatistical Open Data present useful information to feed up a decision-making system. Their integration and storage within these systems is achieved through ETL processes. It is necessary to automate these processes in order to facilitate their accessibility to non-experts. These processes have also need to face out the problems of lack of schemes and structural and sematic heterogeneity, which characterize the Open Data. To meet these issues, we propose a new ETL approach based on graphs. For the extraction, we propose automatic activities performing detection and annotations based on a model of a table. For the transformation, we propose a linear program fulfilling holistic integration of several graphs. This model supplies an optimal and a unique solution. For the loading, we propose a progressive process for the definition of the multidimensional schema and the augmentation of the integrated graph. Finally, we present a prototype and the experimental evaluations
Weber-Baghdiguian, Lexane. "Santé, genre et qualité de l'emploi : une analyse sur données microéconomiques." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLED014/document.
Full textThis thesis studies the influence of work on job and life quality, the latter being considered through the perception that individuals have of their own health. The first chapter focuses on the long-term effects of job losses due to plant closure on job quality. We show that job loss negatively affects wages, perceived job insecurity, the quality of the working environment and job satisfaction, including in the long run. The two last chapters investigate gender differences in self-reported health. The second chapter provides descriptive evidence on the relationships between self-assessed health, gender and mental health problems, i.e. depression and/or affective pains. Finally, in the last chapter, we study the influence of social norms as proxied by the gender structure of the workplace environment, on gender differences in self-reported health. We show that both women and men working in female-dominated environments report more specific health problems than those who work in male-dominated environments. The overall findings of this thesis are twofold. First, losing a job has a negative impact on several dimensions of job quality and satisfaction in the long run. Secondly, mental diseases and social norms at work are important to understand gender-related differences in health perceptions
Teste, Olivier. "Modélisation et manipulation des systèmes OLAP : de l'intégration des documents à l'usager." Habilitation à diriger des recherches, Université Paul Sabatier - Toulouse III, 2009. http://tel.archives-ouvertes.fr/tel-00479460.
Full textLoudcher, Sabine. "Vers l'OLAP sémantique pour l'analyse en ligne des données complexes." Habilitation à diriger des recherches, Université Lumière - Lyon II, 2011. http://tel.archives-ouvertes.fr/tel-00606847.
Full textRapin, Jérémy. "Décompositions parcimonieuses pour l'analyse avancée de données en spectrométrie pour la Santé." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112378/document.
Full textBlind source separation aims at extracting unknown source signals from observations where these sources are mixed together by an unknown process. However, this very generic and non-supervised approach does not always provide exploitable results. Therefore, it is often necessary to add more constraints, generally arising from physical considerations, in order to favor the recovery of sources with a particular sought-after structure. Non-negative matrix factorization (NMF), which is the main focus of this thesis, aims at searching for non-negative sources which are observed through non-negative linear mixtures.In some cases, further information still remains necessary in order to correctly separate the sources. Here, we focus on the sparsity concept, which helps improving the contrast between the sources, while providing very robust approaches, even when the data are contaminated by noise. We show that in order to obtain stable solutions, the non-negativity and sparse constraints must be applied adequately. In addition, using sparsity in a potentially redundant transformed domain could allow to capture the structure of most of natural image, but this kind of regularization proves difficult to apply together with the non-negativity constraint in the direct domain. We therefore propose a sparse NMF algorithm, named nGMCA (non-negative Generalized Morphological Component Analysis), which overcomes these difficulties by making use of proximal calculus techniques. Experiments on simulated data show that this algorithm is robust to additive Gaussian noise contamination, with an automatic control of the sparsity parameter. This novel algorithm also proves to be more efficient and robust than other state-of-the-art NMF algorithms on realistic data.Finally, we apply nGMCA on liquid chromatography - mass spectrometry data. Observation of these data show that they are contaminated by multiplicative noise, which greatly deteriorates the results of the NMF algorithms. An extension of nGMCA was designed to take into account this type of noise, thanks to the use of a non-stationary prior. This extension is then able to obtain excellent results on annotated real data
Flamant-Hulin, Marion. "Pollution intérieure et santé respiratoire : données issues des milieux urbain et rural." Paris 6, 2010. http://www.theses.fr/2010PA066721.
Full textVirouleau, Alain. "Apprentissage statistique pour la détection de données aberrantes et application en santé." Thesis, Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAX028.
Full textThe problems of outliers detection and robust regression in a high-dimensional setting are fundamental in statistics, and have numerous applications.Following a recent set of works providing methods for simultaneous robust regression and outliers detection,we consider in a first part a model of linear regression with individual intercepts, in a high-dimensional setting.We introduce a new procedure for simultaneous estimation of the linear regression coefficients and intercepts, using two dedicated sorted-l1 convex penalizations, also called SLOPE.We develop a complete theory for this problem: first, we provide sharp upper bounds on the statistical estimation error of both the vector of individual intercepts and regression coefficients.Second, we give an asymptotic control on the False Discovery Rate (FDR) and statistical power for support selection of the individual intercepts.Numerical illustrations, with a comparison to recent alternative approaches, are provided on both simulated and several real-world datasets.Our second part is motivated by a genetic problem. Among some particular DNA sequences called multi-satellites, which are indicators of the development or colorectal cancer tumors, we want to find the sequences that have a much higher (resp. much lower) rate of mutation than expected by biologist experts. This problem leads to a non-linear probabilistic model and thus goes beyond the scope of the first part. In this second part we thus consider some generalized linear models with individual intercepts added to the linear predictor, and explore the statistical properties of a new procedure for simultaneous estimation of the regression coefficients and intercepts, using again the sorted-l1 penalization. We focus in this part only on the low-dimensional case and are again interested in the performance of our procedure in terms of statistical estimation error and FDR
Baldi, Isabelle. "Santé et environnement en Aquitaine : bilan des données disponibles et perspectives épidémiologiques." Bordeaux 2, 1995. http://www.theses.fr/1995BOR23023.
Full textFaria, Maria Paula Marçal Grilo Lobato de. "Données génétiques informatisées : un nouveau défi à la protection du droit à la confidentialité des données personnelles de santé." Bordeaux 4, 1996. http://www.theses.fr/1996BOR40030.
Full textAfter a description of the dangers posed to human privacy by "new genetics" and informatics, this thesis leads to the conclusion, by means of an analysis of the portuguese juridical framework, in a compared law perspective, of the right to confidentiality, medical secrecy and personal data protection laws, that contemporary law needs a special legal statute to rule the confidentiality of personal health genetic data without which fundamental human rights will be in threat
Pacitti, Esther. "Réplication asynchrone des données dans trois contextes: entrepôts, grappes et systèmes pair-à-pair." Habilitation à diriger des recherches, Université de Nantes, 2008. http://tel.archives-ouvertes.fr/tel-00473969.
Full textTriki, Salah. "Sécurisation des entrepôts de données : de la conception à l’exploitation." Thesis, Lyon 2, 2013. http://www.theses.fr/2013LYO22026.
Full textCompanies have to make strategic decisions that involve competitive advantages. In the context of decision making, the data warehouse concept has emerged in the nineties. A data warehouse is a special kind of database that consolidates and historizes data from the operational information system of a company. Moreover, a company's data are proprietary and sensitive and should not be sold without controls. Indeed, some data are personal and may harm their owners when they are disclosed, for example, medical data, religious or ideological beliefs. Thus, many governments have enacted laws to protect the private lives of their citizens. Faced with these laws, organizations are, therefore, forced to implement strict security measures to comply with these laws. Our work takes place in the context of secure data warehouses that can be addressed at two levels: (i) design that aims to develop a secure data storage level, and (ii) operating level, which aims to strengthen the rights access / user entitlements, and any malicious data to infer prohibited from data it has access to user banned. For securing the design level, we have made three contributions. The first contribution is a specification language for secure storage. This language is a UML profile called SECDW+, which is an extended version of SECDW for consideration of conflicts of interest in design level. SECDW is a UML profile for specifying some concepts of security in a data warehouse by adopting the standard models of RBAC security and MAC. Although SECDW allows the designer to specify what role has access to any part of the data warehouse, it does not take into account conflicts of interest. Thus, through stereotypes and tagged values , we extended SECDW to allow the definition of conflict of interest for the various elements of a multidimensional model. Our second contribution, at this level, is an approach to detect potential inferences from conception. Our approach is based on the class diagram of the power sources to detect inferences conceptual level. Note that prevention inferences at this level reduces the cost of administering the OLAP server used to manage access to a data warehouse. Finally, our third contribution to the design of a secure warehouse consists of rules for analyzing the consistency of authorizations modeled. As for safety operating level, we proposed: an architecture for enhancing the permissions for configuration, a method for the prevention of inferences, and a method to meet the constraints of additive measures. The proposed architecture adds to system access control, typically present in any secure DBMS, a module to prevent inferences. This takes our security methods against inferences and respect for additivity constraints. Our method of preventing inferences operates for both types of inferences: precise and partial. For accurate inferences, our method is based on Bayesian networks. It builds Bayesian networks corresponding to user queries using the MAX and MIN functions, and prohibits those that are likely to generate inferences. We proposed a set of definitions to translate the result of a query in Bayesian networks. Based on these definitions, we have developed algorithms for constructing Bayesian networks to prohibit those that are likely to generate inferences. In addition, to provide a reasonable response time needed to deal with the prevention treatment, we proposed a technique for predicting potential applications to prohibit. The technique is based on the frequency of inheritance queries to determine the most common query that could follow a request being processed. In addition to specific inferences (performed through queries using the MIN and MAX functions), our method is also facing partial inferences made through queries using the SUM function. Inspired by statistical techniques, our method relies on the distribution of data in the warehouse to decide to prohibit or allow the execution of queries
Teste, Olivier. "Modélisation et manipulation d'entrepôts de données complexes et historisées." Phd thesis, Université Paul Sabatier - Toulouse III, 2000. http://tel.archives-ouvertes.fr/tel-00088986.
Full textAu niveau de l'entrepôt, nous définissons un modèle de données permettant de décrire l'évolution temporelle des objets complexes. Dans notre proposition, l'objet entrepôt intègre des états courants, passés et archivés modélisant les données décisionnelles et leurs évolutions. L'extension du concept d'objet engendre une extension du concept de classe. Cette extension est composée de filtres (temporels et d'archives) pour construire les états passés et archivés ainsi que d'une fonction de construction modélisant le processus d'extraction (origine source). Nous introduisons également le concept d'environnement qui définit des parties temporelles cohérentes de tailles adaptées aux exigences des décideurs. La manipulation des données est une extension des algèbres objet prenant en compte les caractéristiques du modèle de représentation de l'entrepôt. L'extension se situe au niveau des opérateurs temporels et des opérateurs de manipulation des ensembles d'états.
Au niveau des magasins, nous définissons un modèle de données multidimensionnelles permettant de représenter l'information en une constellation de faits ainsi que de dimensions munies de hiérarchies multiples. La manipulation des données s'appuie sur une algèbre englobant l'ensemble des opérations multidimensionnelles et offrant des opérations spécifiques à notre modèle. Nous proposons une démarche d'élaboration des magasins à partir de l'entrepôt.
Pour valider nos propositions, nous présentons le logiciel GEDOOH (Générateur d'Entrepôts de Données Orientées Objet et Historisées) d'aide à la conception et à la création des entrepôts dans le cadre de l'application médicale REANIMATIC.
Cury, Alexandre. "Techniques d'anormalité appliquées à la surveillance de santé structurale." Phd thesis, Université Paris-Est, 2010. http://tel.archives-ouvertes.fr/tel-00581772.
Full textTremblay, Bénédicte L. "Analyse intégrée des données omiques dans l'impact de l'alimentation sur la santé cardiométabolique." Doctoral thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/69041.
Full textAfter cancer, cardiovascular disease (CVD) is the second leading cause of death and one of the leading causes of hospitalization in Canada. CVD management is based on the assessment and treatment of several cardiometabolic risk factors, which include metabolic syndrome, physical activity, and diet. A healthy lifestyle, including a balanced diet, remains the key to prevent CVD. A diet rich in fruits and vegetables is inversely associated with CVD incidence. Biomarkers of exposure to diet are used to study the impact of dietary factors on the development of CVD. Plasma carotenoids, a biomarker of fruit and vegetable consumption, are associated with cardiometabolic health. Diet also influences a myriad of omics factors, thus modulating CVD risk. Omics sciences study the complex set of molecules that make up the body. Among these sciences, genomics, epigenomics, transcriptomics, and metabolomics consider the large-scale study of genes, DNA methylation, gene expression, and metabolites, respectively. Given that a single type of omics data usually does not capture the complexity of biological processes, an integrative approach combining multiple omics data proves ideal to elucidate the pathophysiology of diseases. Systems biology studies the complex interactions of different omics data among themselves and with the environment on a trait such as health. There are several methods for analyzing and integrating omics data. Quantitative genetics estimates the contributions of genetic and environmental effects to the variance of complex traits such as omics data. Weighted correlation network analysis allows the association of a large number of omics data with a trait such as risk factors for diseases. The general objective of this thesis is to study the impact of omics determinants in the link between diet and cardiometabolic health. The first specific objective, using a quantitative genetics approach, is to characterize the heritability of omics data and plasma carotenoids as well as to check if their link with cardiometabolic risk factors can be explained by genetic and environmental factors. The second specific objective, using a weighted correlation network approach, is to assess the role of individual and combined omics data in the relationship between plasma carotenoids and lipid profile. This project is based on the GENERATION observational study, which includes 48 healthy subjects from 16 families. All omics data studied showed familial resemblances due, to varying degrees, to genetic and common environmental effects. Genetics and environment are also involved in the link between DNA methylation and gene expression, as well as between metabolites, carotenoids, and cardiometabolic risk factors. Moreover, weighted correlation network analysis has provided insight into the interactive molecular system that links carotenoids, DNA methylation, gene expression, and lipid profile. In conclusion, the present study, using approaches from quantitative genetics and weighted correlation network analysis, brought to light the impact of some individual and combined omics data in the link between diet and cardiometabolic health
Evans, David. "L'estimation des effets des interventions de santé publique à partir des données observationnelles." Paris 6, 2013. http://www.theses.fr/2013PA066694.
Full textDans cette thèse, nous nous sommes intéressés à la façon de mener une analyse épidémiologique afin de la rendre plus directement informative pour la prise de décision politique et pour la conceptualisation des interventions. Cette approche nous a amené à privilégier certains principes et approches méthodologiques qui ont connu des avancées conceptuelles et techniques récentes. Ces approches ont été explorées et développées dans les deux articles publiés dans le cadre de cette thèse. Dans le premier article, nous avons proposé une approche de sélection des variables d’ajustement dans une analyse épidémiologique qui combine les hypothèses a priori encodées dans un GAO avec une méthode de sélection de covariables, en l’occurrence la procédure dite de « changement de l’estimation d’effet ». Dans le deuxième article, nous avons estimé l’association entre le nombre de patients traités par la dialyse péritonéale dans un centre et les résultats du traitement, en utilisant les GAO pour présenter les hypothèses et pour justifier le choix de variables d’ajustement, une analyse de sensibilité probabiliste et une estimation des effets des interventions pour changer le nombre de patients traités dans les centres. Dans l’analyse standard, il y avait une association protectrice entre le nombre de patients traités et le risque de transfert en hémodialyse ; dans l’analyse centrée sur la politique, l’effet était toujours protecteur mais d’une moindre importance. Ce travail a soulevé plusieurs questions conceptuelles et techniques qui pourraient être le sujet des recherches futures
Taiello, Riccardo. "Apprentissage automatique sécurisé pour l'analyse collaborative des données de santé à grande échelle." Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4031.
Full textThis PhD thesis explores the integration of privacy preservation, medical imaging, and Federated Learning (FL) using advanced cryptographic methods. Within the context of medical image analysis, we develop a privacy-preserving image registration (PPIR) framework. This framework addresses the challenge of registering images confidentially, without revealing their contents. By extending classical registration paradigms, we incorporate cryptographic tools like secure multi-party computation and homomorphic encryption to perform these operations securely. These tools are vital as they prevent data leakage during processing. Given the challenges associated with the performance and scalability of cryptographic methods in high-dimensional data, we optimize our image registration operations using gradient approximations. Our focus extends to increasingly complex registration methods, such as rigid, affine, and non-linear approaches using cubic splines or diffeomorphisms, parameterized by time-varying velocity fields. We demonstrate how these sophisticated registration methods can integrate privacy-preserving mechanisms effectively across various tasks. Concurrently, the thesis addresses the challenge of stragglers in FL, emphasizing the role of Secure Aggregation (SA) in collaborative model training. We introduce "Eagle", a synchronous SA scheme designed to optimize participation by late-arriving devices, significantly enhancing computational and communication efficiencies. We also present "Owl", tailored for buffered asynchronous FL settings, consistently outperforming earlier solutions. Furthermore, in the realm of Buffered AsyncSA, we propose two novel approaches: "Buffalo" and "Buffalo+". "Buffalo" advances SA techniques for Buffered AsyncSA, while "Buffalo+" counters sophisticated attacks that traditional methods fail to detect, such as model replacement. This solution leverages the properties of incremental hash functions and explores the sparsity in the quantization of local gradients from client models. Both Buffalo and Buffalo+ are validated theoretically and experimentally, demonstrating their effectiveness in a new cross-device FL task for medical devices.Finally, this thesis has devoted particular attention to the translation of privacy-preserving tools in real-world applications, notably through the FL open-source framework Fed-BioMed. Contributions concern the introduction of one of the first practical SA implementations specifically designed for cross-silo FL among hospitals, showcasing several practical use cases
Ravat, Franck. "Modèles et outils pour la conception et la manipulation de systèmes d'aide à la décision." Habilitation à diriger des recherches, Université des Sciences Sociales - Toulouse I, 2007. http://tel.archives-ouvertes.fr/tel-00379779.
Full textPour les ED, notre objectif a été d'apporter des solutions pour la modélisation de l'évolution des données décisionnelles (extension de modèle objet) et pour l'intégration de données textuelles sans en fixer le schéma à priori. Pour les MD, nous avons proposé un modèle multidimensionnel de base avec différentes extensions répondant aux besoins des décideurs. Ces extensions permettent de prendre en compte la gestion d'indicateurs et de données textuelles, l'évolution temporelle (versions), la cohérence des données et de ses analyses (contraintes sémantiques), l'intégration et la capitalisation de l'expertise des décideurs (annotations) ainsi que la personnalisation des schémas multidimensionnels (poids). Ces travaux ont été complétés par la proposition d'une démarche de conception qui présente l'avantage de prendre en compte les besoins des décideurs et les sources de données. Cette démarche permet de modéliser aussi bien l'aspect statique (données décisionnelles) que l'aspect dynamique (processus d'alimentation du SAD).
D'un point de vue manipulation des données, nous avons proposé une algèbre complétée d'un langage graphique orienté décideur et d'un langage déclaratif. Nos propositions ont été validées par la participation à différents projets ainsi que le co-encadrement de 5 thèses de doctorat et le suivi de travaux de plusieurs Master Recherche.
Ahmad, Houda. "Une approche matérialisée basée sur les vues pour l'intégration de documents XML." Phd thesis, Grenoble 1, 2009. http://www.theses.fr/2009GRE10086.
Full textSemi-structured data play an increasing role in the development of the Web through the use ofXML. However, the management of semi-structured data poses specific problems because semi-structured data, contrary to classical databases, do not rely on a predefined schema. The schema of a document is contained in the document itself and similar documents may be represented by different schemas. Consequently, the techniques and algorithms used for querying or integrating this data are more complex than those used for structured data. The objective of our work is the integration of XML data by using the principles of Osiris, a prototype of KB-DBMS, in which views are a central concept. Ln this system, a family of objects is defined by a hierarchy of views, where a view is defined by its parent views and its own attributes and constraints. Osiris belongs to the family of Description Logics; the minimal view of a family of objects is assimilated to a primitive concept and its other views to defined concepts. An object of a family satisfies sorne ofits views. For each family of objects, Osiris builds a n-dimensional classification space by analysing the constraints defined in all of its views. This space is used for object classification and indexation. Ln this the sis we study the contribution of the main features of Osiris - classification, indexation and semantic query optimization - to the integration ofXML documents. For this purpose we produce a target schema (an abstract XML schema), who represents an Osiris schema; every document satisfying a source schema (concrete XML schema) is rewritten in terrns of the target schema before undergoing the extraction of the values ofits entities. The objects corresponding to these entities are then classified and indexed. The Osiris mechanism for semantic query optimization can then be used to extract the objects of interest of a query
Ahmad, Houda. "Une approche matérialisée basée sur les vues pour l'intégration de documents XML." Phd thesis, Université Joseph Fourier (Grenoble), 2009. http://tel.archives-ouvertes.fr/tel-00957148.
Full textRoatis, Alexandra. "Efficient Querying and Analytics of Semantic Web Data." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112218/document.
Full textThe utility and relevance of data lie in the information that can be extracted from it.The high rate of data publication and its increased complexity, for instance the heterogeneous, self-describing Semantic Web data, motivate the interest in efficient techniques for data manipulation.In this thesis we leverage mature relational data management technology for querying Semantic Web data.The first part focuses on query answering over data subject to RDFS constraints, stored in relational data management systems. The implicit information resulting from RDF reasoning is required to correctly answer such queries. We introduce the database fragment of RDF, going beyond the expressive power of previously studied fragments. We devise novel techniques for answering Basic Graph Pattern queries within this fragment, exploring the two established approaches for handling RDF semantics, namely graph saturation and query reformulation. In particular, we consider graph updates within each approach and propose a method for incrementally maintaining the saturation. We experimentally study the performance trade-offs of our techniques, which can be deployed on top of any relational data management engine.The second part of this thesis considers the new requirements for data analytics tools and methods emerging from the development of the Semantic Web. We fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data. We propose the first complete formal framework for warehouse-style RDF analytics. Notably, we define analytical schemas tailored to heterogeneous, semantic-rich RDF graphs, analytical queries which (beyond relational cubes) allow flexible querying of the data and the schema as well as powerful aggregation and OLAP-style operations. Experiments on a fully-implemented platform demonstrate the practical interest of our approach
Alatrista-Salas, Hugo. "Extraction de relations spatio-temporelles à partir des données environnementales et de la santé." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00997539.
Full text