Tesis sobre el tema "Entrepôts de données – Médecine"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Entrepôts de données – Médecine".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Assele, Kama Ariane. "Interopérabilité sémantique et entreposage de données cliniques". Paris 6, 2013. http://www.theses.fr/2013PA066359.
Texto completoIn medicine, data warehouses allow to integrate various data sources for decisional analysis. The integrated data often come from distributed and heterogeneous sources, in order to provide an overview of information to analysts and deciders. The clinical data warehousing raises the issue of medical knowledge representation constantly evolving, requiring the use of new methodologies to integrate the semantic dimension of the study domain. The storage problem is related to the complexity of the field to describe and model, but more importantly, to the need to combine domain knowledge with data. Therefore, one of the research topics in the field of data warehouses is about the cohabitation of knowledge and data, and the role of ontologies in data warehouse modeling, data integration and data mining. This work, carried out in an INSERM research laboratory specialized in knowledge health engineering (UMRS 872 EQ20), is part of issue on modeling, sharing and clinical data use, within a semantic interoperability platform. To address this issue, we support the thesis that: (i) the integration of a standardized information model with a knowledge model allows to implement semantic data warehouses in order to optimize the data use; (ii) the use of terminological and ontological resources aids the interconnection of distributed and heterogeneous resources; (iii) data representation impact its exploitation and helps to optimization of decision support systems (e. G. Monitoring tools). Using innovative methods and Semantic Web tools, we have optimized the integration and exploitation of clinical data for the implementation of a monitoring system to assess the evolution of bacterial resistance to antibiotics in Europe. As a first step, we defined the multidimensional model of a semantic data warehouse based on existing standards such as HL7. We subsequently articulated these data with domain knowledge of infectious diseases. For this, we have represented the data across their structure, vocabulary and semantics in an ontology called « data definition ontology », to map data to the domain ontology via mapping rules. We proposed a method for semi-automatic generation of « data definition ontology » from a database schema, using existing tools and projects results. Finally, the data warehouse and semantic resources are accessed and used via a semantic interoperability system developed in the framework of the DebugIT European project (Detecting and Eliminating Bacteria UsinG Information Technology), that we have experimented within the G. Pompidou university hospital (HEGP, France)
Loizillon, Sophie. "Deep learning for automatic quality control and computer-aided diagnosis in neuroimaging using a large-scale clinical data warehouse". Electronic Thesis or Diss., Sorbonne université, 2024. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2024SORUS258.pdf.
Texto completoPatient's hospitalisation generates data about their health, which is essential to ensure that they receive the best possible care. Over the last decade, clinical data warehouses (CDWs) have been created to exploit this vast amount of clinical information for research purposes. CDWs offer remarkable potential for research by bringing together a huge amount of real-world data of diverse nature (electronic health records, imaging data, pathology and laboratory tests...) from up to millions of patients. Access to such large clinical routine datasets, which are an excellent representation of what is acquired daily in clinical practice, is a major advantage in the development and deployment of powerful artificial intelligence models in clinical routine. Currently, most computer-aided diagnosis models are limited by a training performed only on research datasets with patients meeting strict inclusion criteria and data acquired under highly standardised research protocols, which differ considerably from the realities of clinical practice. This gap between research and clinical data is leading to the failure of AI systems to be well generalised in clinical practice.This thesis examined how to leverage clinical data warehouse brain MRI data for research purposes.Because images gathered in CDW are highly heterogeneous, especially regarding their quality, we first focused on developing an automated solution capable of effectively identifying corrupted images in CDWs. We improved the initial automated 3D T1 weighted brain MRI quality control developed by (Bottani et al. 2021) by proposing an innovative transfer learning method, leveraging artefact simulation.In the second work, we extended our automatic quality control for T1-weighted MRI to another common anatomical sequence: 3D FLAIR. As machine learning models are sensitive to distribution shifts, we proposed a semi-supervised domain adaptation framework. Our automatic quality control tool was able to identify images that are not proper 3D FLAIR brain MRIs and assess the overall image quality with a limited number of new manual annotation of FLAIR images. Lastly, we conducted a feasibility study to assess the potential of variational autoencoders for unsupervised anomaly detection. We obtained promising results showing a correlation between Fazekas scores and volumes of lesions segmented by our model, as well as the robustness of the method to image quality. Nevertheless, we still observed failure cases where no lesion is detected at all in lesional cases, which prevents this type of model to be used in clinical routine for now.Although clinical data warehouses are an incredible research ecosystem, to enable a better understanding of the health of the general population and, in the long term, contributing to the development of predictive and preventive medicine, their use for research purposes is not without its difficulties
El, Malki Mohammed. "Modélisation NoSQL des entrepôts de données multidimensionnelles massives". Thesis, Toulouse 2, 2016. http://www.theses.fr/2016TOU20139/document.
Texto completoDecision support systems occupy a large space in companies and large organizations in order to enable analyzes dedicated to decision making. With the advent of big data, the volume of analyzed data reaches critical sizes, challenging conventional approaches to data warehousing, for which current solutions are mainly based on R-OLAP databases. With the emergence of major Web platforms such as Google, Facebook, Twitter, Amazon...etc, many solutions to process big data are developed and called "Not Only SQL". These new approaches are an interesting attempt to build multidimensional data warehouse capable of handling large volumes of data. The questioning of the R-OLAP approach requires revisiting the principles of modeling multidimensional data warehouses.In this manuscript, we proposed implementation processes of multidimensional data warehouses with NoSQL models. We defined four processes for each model; an oriented NoSQL column model and an oriented documents model. Each of these processes fosters a specific treatment. Moreover, the NoSQL context adds complexity to the computation of effective pre-aggregates that are typically set up within the ROLAP context (lattice). We have enlarged our implementations processes to take into account the construction of the lattice in both detained models.As it is difficult to choose a single NoSQL implementation that supports effectively all the applicable treatments, we proposed two translation processes. While the first one concerns intra-models processes, i.e., pass rules from an implementation to another of the same NoSQL logic model, the second process defines the transformation rules of a logic model implementation to another implementation on another logic model
Benitez-Guerrero, Edgard. "Infrastructure adaptable pour l'évolution des entrepôts de données". Université Joseph Fourier (Grenoble), 2002. http://tel.archives-ouvertes.fr/tel-00010335.
Texto completoSautot, Lucile. "Conception et implémentation semi-automatique des entrepôts de données : application aux données écologiques". Thesis, Dijon, 2015. http://www.theses.fr/2015DIJOS055/document.
Texto completoThis thesis concerns the semi-automatic design of data warehouses and the associated OLAP cubes analyzing ecological data.The biological sciences, including ecology and agronomy, generate data that require an important collection effort: several years are often required to obtain a complete data set. Moreover, objects and phenomena studied by these sciences are complex and require many parameter recording to be understood. Finally, the collection of complex data over a long time results in an increased risk of inconsistency. Thus, these sciences generate numerous and heterogeneous data, which can be inconsistent. It is interesting to offer to scientists, who work in life sciences, information systems able to store and restore their data, particularly when those data have a significant volume. Among the existing tools, business intelligence tools, including online analytical systems (On-Line Analytical processing: OLAP), particularly caught our attention because it is data analysis process working on large historical collections (i.e. a data warehouse) to provide support to the decision making. The business intelligence offers tools that allow users to explore large volumes of data, in order to discover patterns and knowledge within the data, and possibly confirm their hypotheses.However, OLAP systems are complex information systems whose implementation requires advanced skills in business intelligence. Thus, although they have interesting features to manage and analyze multidimensional data, their complexity makes them difficult to manage by potential users, who would not be computer scientists.In the literature, several studies have examined the automatic multidimensional design, but the examples provided by theses works were traditional data. Moreover, other articles address the multidimensional modeling adapted to complex data (inconsistency, heterogeneous data, spatial objects, texts, images within a warehouse ...) but the proposed methods are rarely automatic. The aim of this thesis is to provide an automatic design method of data warehouse and OLAP cubes. This method must be able to take into account the inherent complexity of biological data. To test the prototypes, that we proposed in this thesis, we have prepared a data set concerning bird abundance along the Loire. This data set is structured as follows: (1) we have the census of 213 bird species (described with a set of qualitative factors, such as diet) in 198 points along the river for 4 census campaigns; (2) each of the 198 points is described by a set of environmental variables from different sources (land surveys, satellite images, GIS). These environmental variables address the most important issue in terms of multidimensional modeling. These data come from different sources, sometimes independent of bird census campaigns, and are inconsistent in time and space. Moreover, these data are heterogeneous: they can be qualitative factors, quantitative varaibles or spatial objects. Finally, these environmental data include a large number of attributes (158 selected variables) (...)
Bouchakri, Rima. "Conception physique statique et dynamique des entrepôts de données". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2015. http://www.theses.fr/2015ESMA0012/document.
Texto completoData Warehouses store into a single location a huge amount of data. They are interrogated by complex decisional queries called star join queries. To optimize such queries, several works propose algorithms for selecting optimization techniques such as Binary Join Indexes and Horizontal Partitioning during the DW physical design. However, these works propose static algorithms, select optimization techniques in and isolated way and focus on optimizing a single objective which is the query performance. Our main contribution in this thesis is to propose a new vision of optimization techniques selection. Our first contribution is an incremental selection that updates continuously the optimization scheme implemented on the DW, to ensure the continual optimization of queries. To deal with queries complexity increase, our second contribution is a join incremental selection of two optimization techniques which covers the optimization of a maximum number or queries and respects the optimization constraints. Finally, we note that the incremental selection generates a maintenance cost to update the optimization schemes. Thus, our third prop05ilion is to formulate and resolve a multi-objective selection problem or optimization techniques where we have two objectives to optimize : queries performance and maintenance cost of the DW
Boly, Aliou. "Fonctions d'oubli et résumés dans les entrepôts de données". Paris, ENST, 2006. http://www.theses.fr/2006ENST0049.
Texto completoThe amount of data stored in data warehouses grows very quickly so that they get saturated. To overcome this problem, the solution is generally to archive older data when new data arrive if there is no space left. This solution is not satisfactory because data mining analyses based on long term historical data become impossible. As a matter of fact data mining analysis cannot be done on archived data without re-loading them in the data warehouse; and the cost of loading back a large dataset of archived data is too high to be operated just for one analysis. So, archived data must be considered as lost data regarding to data mining applications. In this thesis, we propose a solution for solving this problem: a language is defined to specify forgetting functions on older data. The specifications include the definition of some summaries of deleted data to define what data should be present in the data warehouse at each step of time. These summaries are aggregates and samples of deleted data and will be kept in the data warehouse. The goal of these forgetting functions is to control the size of the data warehouse. This control is provided both for the aggregate summaries and the samples. The specification language for forgetting function is defined in the context of relational databases. Once forgetting functions have been specified, the data warehouse is automatically updated in order to follow the specifications. This thesis presents both the language for specifications, the structure of the summaries, the algorithms to update the data warehouse and the possibility of performing interesting analyses of historical data
Badri, Mohamed. "Maintenance des entrepôts de données issus de sources hétérogènes". Paris 5, 2008. http://www.theses.fr/2008PA05S006.
Texto completoThis work has been performed in the field of data warehouses (DW). DW are in the core of Decision making information system and are used to support decision making tools (OLAP, data mining, reporting). A DW is an alive entity which content is continuously fed and refreshed. Updating aggregates of DW is crucial for the decision making. That is why the DW maintenance has a strategic place in the decision system process. It is also used as a performance criterion of a DW system. Since the communication technologies especially Internet are steadily growing, data are becoming more and more heterogeneous and distributed. We can classify them in three categories: structured data, semi-structured data and unstructured data. In this work we are presenting first a modelling approach with the aim of integrating all this data. On the bases of this approach, we are thereafter proposing a process that insures an incremental warehouse data and aggregates maintenance. We are also proposing a tree structure to manage aggregates as well as algorithms that insure its evolution. Being in the context of heterogeneity, all our proposals are independent of the warehouse model and of its management system. In order to validate our contribution, the Heterogeneous Data Integration and Maintenance (HDIM) prototype has been developped and some experiments performed
Aouiche, Kamel. "Techniques de fouille de données pour l'optimisation automatique des performances des entrepôts de données". Lyon 2, 2005. http://theses.univ-lyon2.fr/documents/lyon2/2005/aouiche_k.
Texto completoWith the development of databases in general and data warehouses in particular, it becomes very important to reduce the function of administration. The aim of auto-administrative systems is administrate and adapt themselves automatically, without loss or even with a gain in performance. The idea of using data mining techniques to extract useful knowledge for administration from the data themselves has been in the air for some years. However, no research has ever been achieved. As for as we know, it nevertheless remains a very promising approach, notably in the field of the data warehousing, where the queries are very heterogeneous and cannot be interpreted easily. The aim of this thesis is to study auto-administration techniques in databases and data warehouses, mainly performance optimization techniques such as indexing and view materialization, and to look for a way of extracting from stored data themselves useful knowledge to apply these techniques. We have designed a tool that finds an index and view configuration allowing to optimize data access time. Our tool searches frequent itemsets in a given workload and clusters the query workload to compute this index and view configuration. Finally, we have extended the performance optimization to XML data warehouses. In this area, we proposed an indexing technique that precomputes joins between XML facts and dimensions and adapted our materialized view selection strategy for XML materialized views
Khrouf, Kaïs. "Entrepôts de documents : de l'alimentation à l'exploitation". Toulouse 3, 2004. http://www.theses.fr/2004TOU30109.
Texto completoIn this thesis, we propose the concept of document warehouse which consists in the storage of heterogeneous, selected and filtered documents, and their classification according to generic logical structures (common structures to a set of documents). Such warehouses organization facilitates the exploitation of the integrated documentary information through several complementary techniques : the information retrieval which consists in the restitution document granules in response to a query formulated with keywords (free language), the data interrogation which consists in the restitution factual data (structure or content) by using a declarative language, the multidimensional analysis which consists in the manipulation of warehouse information according to not-predefined dimensions. To validate our propositions, we developed an aid tool DOCWARE (DOCument WAREhouse) for the integration and the analysis of documents
Wehrle, Pascal. "Modèle multidimensionnel et OLAP sur architecture de grille". Lyon, INSA, 2009. http://theses.insa-lyon.fr/publication/2009ISAL0002/these.pdf.
Texto completoData warehouses and OLAP (OnLine Analytical Processing) systems allow fast and aggregated access to large volumes of data for analysis purposes. In order to further increase the performance of decision support systems, one solution consists in implementing data warehouses on more and more powerful distributed systems. Computing grids in particular provide significant amounts of storage and computing resources. The deployment of a data warehouse on a decentralized grid infrastructure however requires adjustments of the multidimensional data model and of the OLAP processes to take into account the distribution and replication of warehouse data and their aggregates. We introduce an identification model for distributed warehouse data and an indexing method representing the data in the form of multidimensional blocs. This index structure is based on spatial X-tree indexes and cuboïd lattices and allows for localization of materialized data as well as computable aggregates on various grid nodes. We propose an OLAP query execution method aiming at the construction of an optimized query execution plan from a list of candidate blocs providing parts of the query result. Finally, we define a grid services architecture named GIROLAP (Grid Infrastructure for Relational OLAP) which is integrated with the Globus middleware and deployed in the context of the GGM project (Grid for Geno-Medicine) funded by the ACI "Masse de Données"
Triki, Salah. "Sécurisation des entrepôts de données : de la conception à l’exploitation". Thesis, Lyon 2, 2013. http://www.theses.fr/2013LYO22026.
Texto completoCompanies have to make strategic decisions that involve competitive advantages. In the context of decision making, the data warehouse concept has emerged in the nineties. A data warehouse is a special kind of database that consolidates and historizes data from the operational information system of a company. Moreover, a company's data are proprietary and sensitive and should not be sold without controls. Indeed, some data are personal and may harm their owners when they are disclosed, for example, medical data, religious or ideological beliefs. Thus, many governments have enacted laws to protect the private lives of their citizens. Faced with these laws, organizations are, therefore, forced to implement strict security measures to comply with these laws. Our work takes place in the context of secure data warehouses that can be addressed at two levels: (i) design that aims to develop a secure data storage level, and (ii) operating level, which aims to strengthen the rights access / user entitlements, and any malicious data to infer prohibited from data it has access to user banned. For securing the design level, we have made three contributions. The first contribution is a specification language for secure storage. This language is a UML profile called SECDW+, which is an extended version of SECDW for consideration of conflicts of interest in design level. SECDW is a UML profile for specifying some concepts of security in a data warehouse by adopting the standard models of RBAC security and MAC. Although SECDW allows the designer to specify what role has access to any part of the data warehouse, it does not take into account conflicts of interest. Thus, through stereotypes and tagged values , we extended SECDW to allow the definition of conflict of interest for the various elements of a multidimensional model. Our second contribution, at this level, is an approach to detect potential inferences from conception. Our approach is based on the class diagram of the power sources to detect inferences conceptual level. Note that prevention inferences at this level reduces the cost of administering the OLAP server used to manage access to a data warehouse. Finally, our third contribution to the design of a secure warehouse consists of rules for analyzing the consistency of authorizations modeled. As for safety operating level, we proposed: an architecture for enhancing the permissions for configuration, a method for the prevention of inferences, and a method to meet the constraints of additive measures. The proposed architecture adds to system access control, typically present in any secure DBMS, a module to prevent inferences. This takes our security methods against inferences and respect for additivity constraints. Our method of preventing inferences operates for both types of inferences: precise and partial. For accurate inferences, our method is based on Bayesian networks. It builds Bayesian networks corresponding to user queries using the MAX and MIN functions, and prohibits those that are likely to generate inferences. We proposed a set of definitions to translate the result of a query in Bayesian networks. Based on these definitions, we have developed algorithms for constructing Bayesian networks to prohibit those that are likely to generate inferences. In addition, to provide a reasonable response time needed to deal with the prevention treatment, we proposed a technique for predicting potential applications to prohibit. The technique is based on the frequency of inheritance queries to determine the most common query that could follow a request being processed. In addition to specific inferences (performed through queries using the MIN and MAX functions), our method is also facing partial inferences made through queries using the SUM function. Inspired by statistical techniques, our method relies on the distribution of data in the warehouse to decide to prohibit or allow the execution of queries
Dehdouh, Khaled. "Entrepôts de données NoSQL orientés colonnes dans un environnement cloud". Thesis, Lyon 2, 2015. http://www.theses.fr/2015LYO22018.
Texto completoThe work presented in this thesis aims at proposing approaches to build data warehouses by using the columnar NoSQL model. The use of NoSQL models is motivated by the advent of big data and the inability of the relational model, usually used to implement data warehousing, to allow data scalability. Indeed, the NoSQL models are suitable for storing and managing massive data. They are designed to build databases whose storage model is the "key/value". Other models, then, appeared to account for the variability of the data: column oriented, document oriented and graph oriented. We have used the column NoSQL oriented model for building massive data warehouses because it is more suitable for decisional queries that are defined by a set of columns (measures and dimensions) from warehouse. However, the NoSQL model columns do not offer online analysis operators (OLAP) for exploiting the data warehouse.We present in this thesis new solutions for logical and physical modeling of columnar NoSQL data warehouses. We have proposed a new approach that allows building data cubes by taking the characteristics of the columnar environment into account. Thus, we have defined new cube operators which allow building columnar cubes. C-CUBE (Columnar-CUBE) for columnar relational data warehouses. MC-CUBE (MapReduce Columnar-CUBE) for columnar NoSQL data warehouses when measures and dimensions are stored in different tables. Finally, CN-CUBE (Columnar NoSQL-CUBE) when measures and dimensions are gathered in the same table according a new logical model that we proposed. We have studied the NoSQL dimensional data model performance and our OLAP operators, and we have proposed a new star join index C-SJI (Columnar-Star join index) suitable for columnar NoSQL data warehouses which store measures and dimensions separately. To evaluate our contribution, we have defined a cost model to measure the impact of the use of this index. Furthermore, we have proposed a logic model called FLM (Flat Logical Model) to represent a data cube NoSQL oriented columns and enable a better management by columnar NoSQL DBMS.To validate our contributions, we have developed a software framework CG-CDW (Cube Generation for Data Warehouses Columnar) to generate OLAP cubes from columnar data warehouses. Also, we have developed a columnar NoSQL decisional benchmark CNSSB (Columnar NoSQL Star Schema Benchmark) based on the SSB and finally, we conducted several tests that have shown the effectiveness of different aggregation operators that we proposed
Csernel, Baptiste. "Résumé généraliste de flux de données". Paris, ENST, 2008. http://www.theses.fr/2008ENST0048.
Texto completoThis thesis deals with the creation and management of general purpose summaries build from data streams. It is centered on the development of two algorithms, one designed to produce general purpose summaries for a single data stream, and the other for three data stream sharing relational information. A data stream is defined as a real-time, continuous, ordered sequence of items. It is impossible to control the order in which items arrive, nor is it feasible to locally store a stream in its entirety. Such data streams appear in many applications, such as utility networks, IT or in monitoring tasks for instance in meteorology, geology or even finance. The first step in this work is to define the meaning of a general purpose data stream summary. The first property of such a summary is that it should be suitable for a variety of data mining and querying tasks. The second one is that it should be possible to build from the main summary a summary concerning only a selected portion of the stream encountered so far. The first algorithm designed, StreamSamp, is a general purpose summary algorithm dealing with a single data stream and based around the principle of sampling. While the second algorithm, CrossStream, is is a general purpose summary algorithm dealing with three data streams sharing relational information with one another, one relation stream linking two entity streams. This algorithm is based on the use of micro clusters, inspired by the CluStream algorithm designed by Aggarwal combined with the use of Bloom Filter. Both algorithm were implemented and tested against various sets of data to assess their performance in a number of situations
Ahmed, Taher Omran. "Continuité spatiotemporelle dans les entrepôts de données et les modèles multidimensionnels". Lyon 1, 2006. http://www.theses.fr/2006LYO10113.
Texto completoDecision support systems are usually based on multidimensional structures. Facts are stored in structures called hypercubes. Dimensions play the role of axes on which these facts are analyzed and form a space where a fact is located by a set of coordinates. Conventional multidimensional structures deal with discrete facts linked to discrete dimensions. However, when dealing with natural continuous phenomena the discrete representation is not adequate. There is a need to integrate spatiotemporal continuity within multidimensional structures to enable analysis and exploration of continuous field data. In this thesis, we deal with defining a formal multidimensional model for continuous field data. Our model is based on the notion of basic cubes which contain data at the lowest level of detail. Two types of basic cubes were defined : discrete and continuous. Higher level hypercubes are built by applying aggregation operations to basic cubes. New aggregation operations were defined
Ben, Meftah Salma. "Structuration sématique de documents XML centres-documents". Thesis, Toulouse 1, 2017. http://www.theses.fr/2017TOU10061/document.
Texto completoLe résumé en anglais n'a pas été communiqué par l'auteur
Boussahoua, Mohamed. "Optimisation de performances dans les entrepôts de données distribués NoSQL en colonnes". Thesis, Lyon, 2020. http://www.theses.fr/2020LYSE2007.
Texto completoThe work presented in this thesis aims at proposing approaches to build data warehouses (DWs) by using the columnar NoSQL model. The use of NoSQL models is motivated by the advent of big data and the inability of the relational model, usually used to implement DW, to allow data scalability. Indeed, the NoSQL models are suitable for storing and managing massive data. They aredesigned to build databases whose storage model is the "key/value". Other models, then, appeared to account for the variability of the data: column oriented, document oriented and graph oriented. We have used the column NoSQL oriented model for building massive DWs because it is more suitable for decisional queries that are defined by a set of columns (measures and dimensions) from warehouse. Column family NoSQL databases offer storage techniques that are well adapted to DWs. Several scenarios are possible to develop DWs on these databases. We present in this thesis new solutions for logical and physical modeling of columnar NoSQL data warehouses. We have proposed a logic model called NLM (Naive Logical Model) to represent a NoSQL oriented columns DW and enable a better management by columnar NoSQL DBMS. We have proposed a new method to build a distributed DW using a column family NoSQL database. Our method is based on a strategy of grouping attributes from fact tables and dimensions, as families´ columns. In this purpose, we used two algorithms, the first one is a meta-heuristic algorithm, in this case the Particle Swarm Optimization : PSO, and the second one is the k-means algorithm. Furthermore, we have proposed a new method to build an efficient distributed DW inside column family NoSQL DBMSs. Our method based on the association rules method that allows to obtain groups of frequently used attributes in the workload. Hence, the partition keys RowKey, necessary to distribute data onto the different cluster nodes, are composed of those attributes groups.To validate our contributions, we have developed a software tool called RDW2CNoSQ (Relational Data Warehouse to Columnar NoSQL) to build a distributed data warehouse using a column family NoSQL Database. Also, we conducted several tests that have shown the effectiveness of different method that we proposed. Our experiments suggest that defining a good data partitioning and placement schemes during the implementation of the data warehouse with NoSQL HBase increase significantly the computation and querying performances
Serna, Encinas María Trinidad. "Entrepôts de données pour l'aide à la décision médicale : conception et expérimentation". Université Joseph Fourier (Grenoble), 2005. http://www.theses.fr/2005GRE10083.
Texto completoData warehouses integrate infonnation coming from different data sources which are often heterogeneous and distributed. Their main goal is to provide a global view for analysts and managers to make decisions based on data sets and historical logs. The design and construction of a data warehouse are composed by three phases : extraction-integration, organisation and interrogation. Ln this thesis, we are interested in the latter two. For us, the organisation is a complex and delicate task. Hence, we divide it into two parts : data structuring and data managing. For structuring we propose a multidimensional model which is composed by three classes : Cube, Dimension and Hierarchy. We propose also an algorithm for selecting the optimal set of materialized views. We consider that data management should include warehouse evolution. The concept of schema evolution was adapted here and we propose to use bitemporal schema versions for the management, storage and visualization of current and historical data (intentional and extensional). Finally, we have implemented a graphie interface that allows semi-automatic query generation (indicators). These queries (for example, "number of patients by hospitals and diseases") are determined by the application domain. We had the opportunity to work in a medical project ; it allowed us to verify and 10 validate our proposition using real data
Mahboubi, Hadj. "Optimisation de la performance des entrepôts de données XML par fragmentation et répartition". Phd thesis, Université Lumière - Lyon II, 2008. http://tel.archives-ouvertes.fr/tel-00350301.
Texto completoPour atteindre cet objectif, nous proposons dans ce mémoire de pallier conjointement ces limitations par fragmentation puis par répartition sur une grille de données. Pour cela, nous nous sommes intéressés dans un premier temps à la fragmentation des entrepôts des données XML et nous avons proposé des méthodes qui sont à notre connaissance les premières contributions dans ce domaine. Ces méthodes exploitent une charge de requêtes XQuery pour déduire un schéma de fragmentation horizontale dérivée.
Nous avons tout d'abord proposé l'adaptation des techniques les plus efficaces du domaine relationnel aux entrepôts de données XML, puis une méthode de fragmentation originale basée sur la technique de classification k-means. Cette dernière nous a permis de contrôler le nombre de fragments. Nous avons finalement proposé une approche de répartition d'un entrepôt de données XML sur une grille. Ces propositions nous ont amené à proposer un modèle de référence pour les entrepôts de données XML qui unifie et étend les modèles existants dans la littérature.
Nous avons finalement choisi de valider nos méthodes de manière expérimentale. Pour cela, nous avons conçu et développé un banc d'essais pour les entrepôts de données XML : XWeB. Les résultats expérimentaux que nous avons obtenus montrent que nous avons atteint notre objectif de maîtriser le volume de données XML et le temps de traitement de requêtes décisionnelles complexes. Ils montrent également que notre méthode de fragmentation basée sur les k-means fournit un gain de performance plus élevé que celui obtenu par les méthodes de fragmentation horizontale dérivée classiques, à la fois en terme de gain de performance et de surcharge des algorithmes.
Ben, Messaoud Riadh. "Couplage de l'analyse en ligne et de la fouille de données pour l'exploration, l'agrégation et l'explication des données complexes". Lyon 2, 2006. http://theses.univ-lyon2.fr/documents/lyon2/2006/benmessaoud_r.
Texto completoData warehouses provide efficient solutions for the management of huge amounts of data. Online analytical processing (OLAP) is a key feature in data warehouses which enables users with visual tools to explore data cubes. Therefore, users are capable to extract relevant information for their decision-making. On the other hand, data mining offers automatic learning techniques in order to come out with comprehensive knowledge covering descriptions, clusterings and explanations. The idea of combining online analytical processing and data mining is a promising solution to improve the decision-making process, especially in the case of complex data. In fact, OLAP and data mining could be two complementary fields that interact together within a unique analysis process. The aim of this thesis is to propose new approaches for decision support based on coupling online analytical processing and data mining. In order to do so, we have established three main proposals. The first one concerns the visualization of sparse data. According to the multiple correspondence analysis, we have reduced the negative effect of sparsity by reorganizing the cells of a data cube. Our second proposal provides a new aggregation of facts in a data cube by using agglomerative hierarchical clustering. The obtained aggregates are semantically richer than those provided by traditional multidimensional structures. Our third proposal tries to explain possible relationships within multidimensional data by using association rules. We have designed a new algorithm for a guided-mining of association rules in data cubes. We have also developed a software platform which includes our theoretical contributions. In addition, we provided a case study on complex data in order to validate our approaches. Finally, based on an OLAP algebra, we have designed the first principles toward a general formal framework which models the problem of coupling online analytical processing and data mining
Bentayeb, Fadila. "Entrepôts et analyse en ligne de données complexes centrés utilisateur : un nouveau défi". Habilitation à diriger des recherches, Université Lumière - Lyon II, 2011. http://tel.archives-ouvertes.fr/tel-00752126.
Texto completoKerkad, Amira. "L'interaction au service de l'optimisation à grande échelle des entrepôts de données relationnels". Phd thesis, ISAE-ENSMA Ecole Nationale Supérieure de Mécanique et d'Aérotechique - Poitiers, 2013. http://tel.archives-ouvertes.fr/tel-00954469.
Texto completoKermanshahani, Shokoh. "IXIA (IndeX-based Integration Approach) : une approche hybride pour l'intégration des données". Université Joseph Fourier (Grenoble), 2009. http://www.theses.fr/2009GRE10114.
Texto completoThere is a large and increasing volume of documents, data sources and data base management systems available in the world, and many autonomous and heterogeneous sources speak of a same reality while using different words and conceptual structures. Many organizations need to dispose of a system that handles such data in a homogeneous way, which necessitates the integration of these data sources. The goal of a data integration system is to develop a homogeneous interface for the end users to query several heterogeneous and autonomous sources. Building such a homogeneous interface raises many challenges among which the heterogeneity of data sources, the fragmentation of data, the processing and optimization of queries appear to be the most important. There are many research projects that present different approaches and each of them proposes a solution to each of these problems. Depending on the integrated view, these approaches can be categorized into two main categories: materialized and virtual approaches; there are also some hybrid approaches when there is a composition of materialized and virtual views. The main advantage of a hybrid approach is to offer a trade-off between the query response time and data freshness in a data integration system. In the existing approaches, query optimization is often privileged for the materialized part of the system. In this thesis, we develop a hybrid approach which aims to extend query optimization to all the queries of the integration system. It also provides a flexible data refreshing mechanism in order to tolerate different characteristics of sources and their data. This approach is based on the Osiris object indexing system. Osiris is a database and knowledge base platform with a specific object data model based on a hierarchy of views. Its indexation system relies on the partitioning of the object space using the view constraints. IXIA, the hybrid approach presented in this thesis, materializes the indexation structure of the underlying objects at the mediator level. The Oids of objects, their correspondence with the source objects and the needed data to refresh the indexation data are also materialized. Our index-based data integration approach offers more flexibility in data refreshing than a fully materialized approach and a better query response time in comparison with a fully virtual data integration system
Naoum, Lamiaa. "Un modèle multidimensionnel pour un processus d'analyse en ligne de résumés flous". Nantes, 2006. http://www.theses.fr/2006NANT2101.
Texto completoFavre, Cécile. "Evolution de schémas dans les entrepôts de données : mise à jour de hiérarchies de dimension pour la personnalisation des analyses". Lyon 2, 2007. http://theses.univ-lyon2.fr/documents/lyon2/2007/favre_c.
Texto completoIn this thesis, we propose a solution to personalize analyses in data warehousing. This solution is based on schema evolution driven by users. More precisely, it consists in users’ knowledge and integrating it in the data warehouse to build new analysis axes. To achieve that, we propose an evolving rule-based data warehouse formal model. The rules are named aggregation rules. To exploit this model, we propose an architecture that allows the personalization process. This architecture includes four modules: users’ knowledge acquisition under the form of if-then rules, integration of these rules in the data warehouse; schema evolution; on-line analysis on the new schema. To realize this architecture, we propose an executive model in the relational context to deal with the process of the global architecture. Besides we interested in the evaluation of our evolving model. To do that, we propose an incremental updating method of a given workload in response to the data warehouse schema evolution. To validate our proposals, we developed the WEDriK (data Warehouse Evolution Driven by Knowledge) platform. The problems evoked in this thesis come from the reality of the LCL bank
Galhardas, Héléna. "Nettoyage de données : modèle, langage déclaratif et algorithmes". Versailles-St Quentin en Yvelines, 2001. http://www.theses.fr/2001VERS0032.
Texto completoThe problem od data cleaning, which consists of removing inconsistencies and errors from original data sets, is well know in the area of decision support systems and data warehouses. This holds regardless of the application-relational database joining, web-related, or scientific. In all cases, existing ETL (Extraction transformation Loading) and data cleaning tools for writing data cleaning programs are insufficient. The main challenge is the design and implementation of a data flow graph that effectivrly generates clean data. Needed improvements to the current state of the art include (i) a clear separation between the logical specification of data transformations and their physical implementation (ii) debugging of the reasoning behind cleaning results, (iii) and interactive facilities to tune a data cleaning program. This thesis presents a langage, an execution model and algorithms that enable users to express data cleaning specifications declaratively and perform the cleaning efficiently. We use as an example a set of bibliographic references used to construct the Citeseer web site. The underlying data integration problem is to derive structured and clean textual records so that meaningful queries can be performed. Experimental results report on the assesment of the proposed framework for data cleaning
Boukhalfa, Kamel. "De la conception physique aux outils d'administration et de tuning des entrepôts de données". Phd thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aéronautique, 2009. http://tel.archives-ouvertes.fr/tel-00410411.
Texto completoMots clés : Conception physique, Tuning, Techniques d'optimisation, Fragmentation Horizontale, Index de Jointure Binaires.
Pacitti, Esther. "Réplication asynchrone des données dans trois contextes: entrepôts, grappes et systèmes pair-à-pair". Habilitation à diriger des recherches, Université de Nantes, 2008. http://tel.archives-ouvertes.fr/tel-00473969.
Texto completoAbdelhédi, Fatma. "Conception assistée d’entrepôts de données et de documents XML pour l’analyse OLAP". Thesis, Toulouse 1, 2014. http://www.theses.fr/2014TOU10005/document.
Texto completoToday, data warehouses are a major issue for business intelligence applications within companies. Sources of a warehouse, i.e. the origin of data that feed, are diverse and heterogeneous sequential files, spreadsheets, relational databases, Web documents. The complexity is such that the software on the market only partially meets the needs of decision makers when they want to analyze the data. Therefore, our work is within the decision support systems context that integrate all data types (mainly extracted from relational databases and XML documents databases) for decision makers. They aim to provide models, methods and software tools to elaborate and manipulate data warehouses. Our work has specifically focused on two complementary issues: aided data warehouse and modeling and OLAP analysis of XML documents
Naouali, Sami. "Enrichissement d'entrepôts de données par la connaissance : application au web". Nantes, 2004. http://www.theses.fr/2004NANT2093.
Texto completoJouhet, Vianney. "Automated adaptation of Electronic Heath Record for secondary use in oncology". Thesis, Bordeaux, 2016. http://www.theses.fr/2016BORD0373/document.
Texto completoWith the increasing adoption of Electronic Health Records (EHR), the amount of data produced at the patient bedside is rapidly increasing. Secondary use is there by an important field to investigate in order facilitate research and evaluation. In these work we discussed issues related to data representation and semantics within EHR that need to be address in order to facilitate secondary of structured data in oncology. We propose and evaluate ontology based methods for heterogeneous diagnosis terminologies integration in oncology. We then extend obtained model to enable tumoral disease representation and links with diagnosis as recorded in EHR. We then propose and implement a complete architecture combining a clinical data warehouse, a metadata registry and web semantic technologies and standards. This architecture enables syntactic and semantic integration of a broad range of hospital information System observation. Our approach links data with external knowledge (ontology), in order to provide a knowledge resource for an algorithm for tumoral disease identification based on diagnosis recorded within EHRs. As it based on the ontology classes, the identification algorithm is uses an integrated view of diagnosis (avoiding semantic heterogeneity). The proposed architecture leading to algorithm on the top of an ontology offers a flexible solution. Adapting the ontology, modifying for instance the granularity provide a way for adapting aggregation depending on specific needs
Garcelon, Nicolas. "Problématique des entrepôts de données textuelles : dr Warehouse et la recherche translationnelle sur les maladies rares". Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB257/document.
Texto completoThe repurposing of clinical data for research has become widespread with the development of clinical data warehouses. These data warehouses are modeled to integrate and explore structured data related to thesauri. These data come mainly from machine (biology, genetics, cardiology, etc.) but also from manual data input forms. The production of care is also largely providing textual data from hospital reports (hospitalization, surgery, imaging, anatomopathologic etc.), free text areas in electronic forms. This mass of data, little used by conventional warehouses, is an indispensable source of information in the context of rare diseases. Indeed, the free text makes it possible to describe the clinical picture of a patient with more precision and expressing the absence of signs and uncertainty. Particularly for patients still undiagnosed, the doctor describes the patient's medical history outside any nosological framework. This wealth of information makes clinical text a valuable source for translational research. However, this requires appropriate algorithms and tools to enable optimized re-use by doctors and researchers. We present in this thesis the data warehouse centered on the clinical document, which we have modeled, implemented and evaluated. In three cases of use for translational research in the context of rare diseases, we attempted to address the problems inherent in textual data: (i) recruitment of patients through a search engine adapted to textual (data negation and family history detection), (ii) automated phenotyping from textual data, and (iii) diagnosis by similarity between patients based on phenotyping. We were able to evaluate these methods on the data warehouse of Necker-Enfants Malades created and fed during this thesis, integrating about 490,000 patients and 4 million reports. These methods and algorithms were integrated into the software Dr Warehouse developed during the thesis and distributed in Open source since September 2017
Mathieu, Jean. "Intégration de données temps-réel issues de capteurs dans un entrepôt de données géo-décisionnel". Thesis, Université Laval, 2011. http://www.theses.ulaval.ca/2011/28019/28019.pdf.
Texto completoIn the last decade, the use of sensors for measuring various phenomenons has greatly increased. As such, we can now make use of sensors to measure GPS position, temperature and even the heartbeats of a person. Nowadays, the wide diversity of sensor makes them the best tools to gather data. Along with this effervescence, analysis tools have also advanced since the creation of transactional databases, leading to a new category of tools, analysis systems (Business Intelligence (BI)), which respond to the need of the global analysis of the data. Data warehouses and OLAP (On-Line Analytical Processing) tools, which belong to this category, enable users to analyze big volumes of data, execute time-based requests and build statistic graphs in a few simple mouse clicks. Although the various types of sensor can surely enrich any analysis, such data requires heavy integration processes to be driven into the data warehouse, centerpiece of any decision-making process. The different data types produced by sensors, sensor models and ways to transfer such data are even today significant obstacles to sensors data streams integration in a geo-decisional data warehouse. Also, actual geo-decisional data warehouses are not initially built to welcome new data on a high frequency. Since the performances of a data warehouse are restricted during an update, new data is usually added weekly, monthly, etc. However, some data warehouses, called Real-Time Data Warehouses (RTDW), are able to be updated several times a day without letting its performance diminish during the process. But this technology is not very common, very costly and in most of cases considered as "beta" versions. Therefore, this research aims to develop an approach allowing to publish and normalize real-time sensors data streams and to integrate it into a classic data warehouse. An optimized update strategy has also been developed so the frequent new data can be added to the analysis without affecting the data warehouse performances.
Salmi, Cheik. "Vers une description et une modélisation des entrées des modèles de coût mathématiques pour l'optimisation des entrepôts de données". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2017. http://www.theses.fr/2017ESMA0006/document.
Texto completoData warehouses (DW) have become a mature technology. The emphasis of the analysis requests is driven by technological change, the new programmig paradigms and ModelDriven Engineering (MDI). Before using these technological advances, the DW must be buil tand prepared for its proper operation. The construction phase bas seen massive description efforts and meta modeling to facilitate the definition of correspondence between local data sources schemas and DW schema and to reduce heterogeneity between sources. Despite its importance in all stages of the design life cycle of an DW, the operational phase and in particular its physical task, did not have the same interest in term of description and meta modeling. During this phase, mathematical cost models are used to quantify the quality of the solutions proposed. The development of these models requires collection efforts and analysis of relevant parameters. To simulate the operation of a DW, all the dimensions of a DBMS must be integrated. In this thesis, we propose to describe in detail these dimensions with meta-modeling mechanisms. Given the singularity and hierarchy between storage media, we have developed an ontology dedicated to storage media, which makes explicit their properties. The similarities between these supports motivated us to develop a hybrid cache based on flash memory. This increases the cache ability to store a large number of intermediate results shared by multiple decision-support queries. The reuse of these results will increase the overall performance of fue DBMS. Our contributions are validated with experiments using our theoretical cost models and the Oracle DBMS
Guérin, Émilie. "Intégration de données pour l'analyse de transcriptome : mise en œuvre par l'entrepôt GEDAW (Gene Expression Data Warehouse)". Rennes 1, 2005. http://www.theses.fr/2005REN1S169.
Texto completoDarmont, Jérôme. "Optimisation et évaluation de performance pour l'aide à la conception et à l'administration des entrepôts de données complexes". Habilitation à diriger des recherches, Université Lumière - Lyon II, 2006. http://tel.archives-ouvertes.fr/tel-00143361.
Texto completoLe travail présenté dans ce mémoire vise à proposer des solutions innovantes au niveau de l'optimisation et de l'évaluation des performances des entrepôts de données. Nous avons en effet conçu une approche générique dont l'objectif est de proposer automatiquement à l'administrateur d'un entrepôt des solutions permettant d'optimiser les temps d'accès aux données. Le principe de cette approche est d'appliquer des techniques de fouille de données sur une charge (ensemble de requêtes) représentative de l'utilisation de l'entrepôt de données afin de déduire une configuration quasi-optimale d'index et/ou de vues matérialisées. Des modèles de coût permettent ensuite de sélectionner parmi ces structures de données les plus efficaces en terme de rapport gain de performance/surcharge.
Par ailleurs, l'évaluation de performance peut venir en appui de la conception des entrepôts de données. Ainsi, afin de valider notre approche de manière expérimentale, nous avons également conçu plusieurs bancs d'essais génériques. Le principe directeur qui a présidé à leur élaboration est l'adaptabilité. En effet, pour comparer l'efficacité de différentes techniques d'optimisation des performances, il est nécessaire de les tester dans différents environnements, sur différentes configurations de bases de données et de charges, etc. La possibilité d'évaluer l'impact de différents choix d'architecture est aussi une aide appréciable dans la conception des entrepôts de données. Nos bancs d'essais permettent donc de générer diverses configurations d'entrepôts de données, ainsi que des charges décisionnelles qui s'y appliquent.
Finalement, nos solutions d'optimisation et d'évaluation des performances ont été mises en oeuvre dans les contextes des entrepôts de données relationnels et XML.
Atigui, Faten. "Approche dirigée par les modèles pour l’implantation et la réduction d’entrepôts de données". Thesis, Toulouse 1, 2013. http://www.theses.fr/2013TOU10044/document.
Texto completoOur work handles decision support systems based on multidimensional Data Warehouse (DW). A Data Warehouse (DW) is a huge amount of data, often historical, used for complex and sophisticated analysis. It supports the business process within an organization. The relevant data for the decision-making process are collected from data sources by means of software processes commonly known as ETL (Extraction-Transformation-Loading) processes. The study of existing systems and methods shows two major limits. Actually, when building a DW, the designer deals with two major issues. The first issue treats the DW's design, whereas the second addresses the ETL processes design. Current frameworks provide partial solutions that focus either on the multidimensional structure or on the ETL processes, yet both could benefit from each other. However, few studies have considered these issues in a unified framework and have provided solutions to automate all of these tasks. Since its creation, the DW has a large amount of data, mainly due to the historical data. Looking into the decision maker's analysis over time, we can see that they are usually less interested in old data.To overcome these shortcomings, this thesis aims to formalize the development of a time-varying (with a temporal dimension) DW from its design to its physical implementation. We use the Model Driven Engineering (MDE) that automates the process and thus significantly reduce development costs and improve the software quality. The contributions of this thesis are summarized as follows: 1. To formalize and to automate the development of a time-varying DW within a model-driven approach that provides: - A set of unified (conceptual, logical and physical) metamodels that describe data and transformation operations. - An OCL (Object Constraint Language) extension that aims to conceptually formalize the transformation operations. - A set of transformation rules that maps the conceptual model to logical and physical models. - A set of transformation rules that generates the code. 2. To formalize and to automate historical data reduction within a model-driven approach that provides : - A set of (conceptual, logical and physical) metamodels that describe the reduced data. - A set of reduction operations. - A set of transformation rules that implement these operations at the physical level.In order to validate our proposals, we have developed a prototype composed of three parts. The first part performs the transformation of models to lower level models. The second part transforms the physical model into code. The last part allows the DW reduction
Péguiron, Frédérique. "Application de l'Intelligence Économique dans un Système d'Information Stratégique universitaire : les apports de la modélisation des acteurs". Nancy 2, 2006. http://docnum.univ-lorraine.fr/public/NANCY2/doc240/2006NAN21014.pdf.
Texto completoThe process of intelligence economic makes it possible to make move a university information system in a university strategic information system. Questions : "To take does a step of IE in the improvement of information system make it possible to improve satisfaction of the users ?" and "How to integrate the representation of the user ?" guide our step. We study the processes specific for the organization, the processes specific to the teacher and the processes specific to the student to propose a model "RUBI3". The experimentation shows the technical and organizational difficulties to build a data warehouse with the taking into account of the context of the university. We identify several levels to build a data warehouse of an university : level modeling, level application and level meta modeling. The integration of the documentary information system in the decisional information system of the university leads to a system of economic intelligence. The worlds of the indexing and the worlds of decisional are connected by the data warehouses
Boullé, Marc. "Recherche d'une représentation des données efficace pour la fouille des grandes bases de données". Phd thesis, Télécom ParisTech, 2007. http://pastel.archives-ouvertes.fr/pastel-00003023.
Texto completoNicolicin, Georgescu Vlad. "Knowledge acquisition and management for driving a decision support system - BI Self-X -". Nantes, 2011. https://archive.bu.univ-nantes.fr/pollux/show/show?id=8a74ae6d-d6f5-429b-a51c-9ea9117902b0.
Texto completoThis thesis combines three major research domains: (i) the management Decision Support Systems (DSS) and Data Warehouses, (ii) autonomic task management using Autonomic Computing and (iii) the transformation and modeling of knowledge by adopting Web Semantic technologies and Ontologies. In the literature, most of the references are done towards Operational Systems, which are fundamentally different from DSSs. There is a lack of well defined management best practices for DSS. In this context the two main issues are addressed: (i) the integration of the DSS management knowledge into a unified knowledge source with the help of ontologies and (ii) the usage of the integrated knowledge base with the Autonomic Computing model. The principal contributions of the thesis are: (i) the elaboration of an ontology model of the DSS and its management policies, which includes architectures, parameters, technical performances, subjective performances (QoS), best practices, known issues, service levels (SLA/O); (ii) the elaboration of an autonomic computing adoption model that provides the DSS with self management functions: configuration, healing and optimization, with the main purpose of improving the levels of service; (iii) the development of BI Self-X, composed of three modules each in charge of an AC self management function. The results obtained with this approach have proven that enterprises using BI Self-X with their DSS have increased performance and service levels while decreasing the costs and time in the implementation and maintaining of their data warehouses
Karadimas, Harry. "Stratégies et modèles de données pour la mise en place d'un système d'aide à la décision dirigé par les données et basé sur la syntaxe Arden". Paris 13, 2005. http://www.theses.fr/2005PA132039.
Texto completoArres, Billel. "Optimisation des performances dans les entrepôts distribués avec Mapreduce : traitement des problèmes de partionnement et de distribution des données". Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2012.
Texto completoIn this manuscript, we addressed the problems of data partitioning and distribution for large scale data warehouses distributed with MapReduce. First, we address the problem of data distribution. In this case, we propose a strategy to optimize data placement on distributed systems, based on the collocation principle. The objective is to optimize queries performances through the definition of an intentional data distribution schema of data to reduce the amount of data transferred between nodes during treatments, specifically during MapReduce’s shuffling phase. Secondly, we propose a new approach to improve data partitioning and placement in distributed file systems, especially Hadoop-based systems, which is the standard implementation of the MapReduce paradigm. The aim is to overcome the default data partitioning and placement policies which does not take any relational data characteristics into account. Our proposal proceeds according to two steps. Based on queries workload, it defines an efficient partitioning schema. After that, the system defines a data distribution schema that meets the best user’s needs, and this, by collocating data blocks on the same or closest nodes. The objective in this case is to optimize queries execution and parallel processing performances, by improving data access. Our third proposal addresses the problem of the workload dynamicity, since users analytical needs evolve through time. In this case, we propose the use of multi-agents systems (MAS) as an extension of our data partitioning and placement approach. Through autonomy and self-control that characterize MAS, we developed a platform that defines automatically new distribution schemas, as new queries appends to the system, and apply a data rebalancing according to this new schema. This allows offloading the system administrator of the burden of managing load balance, besides improving queries performances by adopting careful data partitioning and placement policies. Finally, to validate our contributions we conduct a set of experiments to evaluate our different approaches proposed in this manuscript. We study the impact of an intentional data partitioning and distribution on data warehouse loading phase, the execution of analytical queries, OLAP cubes construction, as well as load balancing. We also defined a cost model that allowed us to evaluate and validate the partitioning strategy proposed in this work
Bimonte, Sandro. "Intégration de l'information géographique dans les entrepôts de données et l'analyse en ligne : de la modélisation à la visualisation". Lyon, INSA, 2007. http://theses.insa-lyon.fr/publication/2007ISAL0105/these.pdf.
Texto completoData warehouse and OLAP systems are decision-making solutions. Integration of spatial data into OLAP systems is an important challenge. Indeed, geographic information is always present implicitly or explicitly into data, but generally it is not well handled into the decisional process. Spatial OLAP (SOLAP) systems, which are the integration of OLAP and Geographic Information Systems (GIS), are a promising way. Most of SOLAP solution reduces geographic information to its spatial component, limiting the analysis capabilities of the spatio-multidimensional paradigm. We propose a formal model (GeoCube) and its associated algebra. GeoCube reformulates main SOLAP concepts in order to introduce semantic and spatial aspects of geographic information into the multidimensional analysis. We model measures and dimension members as geographic and/or complex objects. A measure can belong to one or more hierarchies. We propose an algebra which provides the drill and slice operators, an operator to invert measures and dimension, and two operators to navigate into the hierarchy of the measure. The algebra permits to introduce the spatial analysis methods into multidimensional analysis through some new operators which change dynamically the structure of the hypercube. We have realized a web prototype based on GeoCube. We describe our works using environmental data of Venice lagoon pollution. Finally, we propose a new visualization and interaction paradigm to analyze geographic measures
Mavroudakis, Nicolas. "Stimulation magnétique corticale: données normatives et modifications pharmacologiques et pathologiques". Doctoral thesis, Universite Libre de Bruxelles, 2000. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/211691.
Texto completoNegre, Elsa. "Exploration collaborative de cubes de données". Thesis, Tours, 2009. http://www.theses.fr/2009TOUR4023/document.
Texto completoLes entrepôts de données stockent de gros volumes de données multidimensionnelles, consolidées et historisées dans le but d'être explorées et analysées par différents utilisateurs. L'exploration de données est un processus de recherche d'informations pertinentes au sein d'un ensemble de données. Dans le cadre de nos travaux, l'ensemble de données à explorer est un cube de données qui est un extrait de l'entrepôt de données que les utilisateurs interrogent en lançant des séquences de requêtes OLAP (On-Line Analytical Processing). Cependant, cette masse d'informations à explorer peut être très importante et variée, il est donc nécessaire d'aider l'utilisateur à y faire face en le guidant dans son exploration du cube de données afin qu'il trouve des informations pertinentes. Le travail présenté dans cette thèse a pour objectif de proposer des recommandations, sous forme de requêtes OLAP, à un utilisateur interrogeant un cube de données. Cette proposition tire parti de ce qu'ont fait les autres utilisateurs lors de leurs précédentes explorations du même cube de données. Nous commençons par présenter un aperçu du cadre et des techniques utilisés en Recherche d'Informations, Exploration des Usages du Web ou e-commerce. Puis, en nous inspirant de ce cadre, nous présentons un état de l'art sur l'aide à l'exploration des bases de données (relationnelles et multidimensionnelles). Cela nous permet de dégager des axes de travail dans le contexte des bases de données multidimensionnelles. Par la suite, nous proposons donc un cadre générique de génération de recommandations, générique dans le sens où les trois étapes du processus sont paramétrables. Ainsi, à partir d'un ensemble de séquences de requêtes, correspondant aux explorations du cube de données faites précédemment par différents utilisateurs, et de la séquence de requêtes de l'utilisateur courant, notre cadre propose un ensemble de requêtes pouvant faire suite à la séquence de requêtes courante. Puis, diverses instanciations de ce cadre sont proposées. Nous présentons ensuite un prototype écrit en Java. Il permet à un utilisateur de spécifier sa séquence de requêtes courante et lui renvoie un ensemble de recommandations. Ce prototype nous permet de valider notre approche et d'en vérifier l'efficacité avec un série d'expérimentations. Finalement, afin d'améliorer cette aide collaborative à l'exploration de cubes de données et de permettre, notamment, le partage de requêtes, la navigation au sein des requêtes posées sur le cube de données, ou encore de les annoter, nous proposons un cadre d'organisation de requêtes. Ainsi, une instanciation adaptée à la gestion des recommandations est présentée
Amanzougarene, Fatiha. "Extension du modèle multidimensionnel aux faits qualitatifs. Application à l'analyse en ligne des gênes des chantiers urbains". Versailles-St Quentin en Yvelines, 2014. http://www.theses.fr/2014VERS0019.
Texto completoData warehouses and OLAP systems constitute the main elements of decision support systems. In recent years, several studies have been conducted in order to extend the capabilities of conventional data warehouse to handle complex data types (e. G. , text, multimedia, geographic, etc. ) In this thesis, we focused on the integration problem of qualitative information in multidimensional analysis. Our work are guided by a case study on urban building sites annoyances. After defining the notion of annoyance and determined the factors involved in its evaluation, we highlighted the need for a qualitative representation model based on rules and expert knowledge. However, conventional multidimensional data models only consider quantitative measures. Therefore, our main contribution is to extend the multidimensional model to treat qualitative measures expressed as linguistic terms. Considering that expert knowledge are sometimes incomplete, our second contribution is to propose an original model for missing data reconstruction in the context of data warehouses. This model consists in combining the constraint programming and a technique of machine learning, namely the k-nearest neighbor algorithm. In addition to its application in classical data warehouses, our model adapts to qualitative data warehouses, as in the annoyances analysis of urban building sites
Khemiri, Rym. "Vers l'OLAP collaboratif pour la recommandation des analyses en ligne personnalisées". Thesis, Lyon 2, 2015. http://www.theses.fr/2015LYO22015/document.
Texto completoThe objective of this thesis is to provide a collaborative approach to the OLAP involving several users, led by an integrated personalization process in decision-making systems in order to help the end user in their analysis process. Whether personalizing the warehouse model, recommending decision queries or recommending navigation paths within the data cubes, the user need an efficient decision-making system that assist him. We were interested in three issues falling within data warehouse and OLAP personalization offering three major contributions. Our contributions are based on a combination of datamining techniques with data warehouses and OLAP technology. Our first contribution is an approach about personalizing dimension hierarchies to obtain new analytical axes semantically richer for the user that can help him to realize new analyzes not provided by the original data warehouse model. Indeed, we relax the constraint of the fixed model of the data warehouse which allows the user to create new relevant analysis axes taking into account both his/her constraints and his/her requirements. Our approach is based on an unsupervised learning method, the constrained k-means. Our goal is then to recommend these new hierarchy levels to other users of the same user community, in the spirit of a collaborative system in which each individual brings his contribution. The second contribution is an interactive approach to help the user to formulate new decision queries to build relevant OLAP cubes based on its past decision queries, allowing it to anticipate its future analysis needs. This approach is based on the extraction of frequent itemsets from a query load associated with one or a set of users belonging to the same actors in a community organization. Our intuition is that the relevance of a decision query is strongly correlated to the usage frequency of the corresponding attributes within a given workload of a user (or group of users). Indeed, our approach of decision queries formulation is a collaborative approach because it allows the user to formulate relevant queries, step by step, from the most commonly used attributes by all actors of the user community. Our third contribution is a navigation paths recommendation approach within OLAP cubes. Users are often left to themselves and are not guided in their navigation process. To overcome this problem, we develop a user-centered approach that suggests the user navigation guidance. Indeed, we guide the user to go to the most interesting facts in OLAP cubes telling him the most relevant navigation paths for him. This approach is based on Markov chains that predict the next analysis query from the only current query. This work is part of a collaborative approach because transition probabilities from one query to another in the cuboids lattice (OLAP cube) is calculated by taking into account all analysis queries of all users belonging to the same community. To validate our proposals, we present a support system user-centered decision which comes in two subsystems: (1) content personalization and (2) recommendation of decision queries and navigation paths. We also conducted experiments that showed the effectiveness of our analysis online user centered approaches using quality measures such as recall and precision
Barkat, Okba. "Utilisation conjointe des ontologies et du contexte pour la conception des systèmes de stockage de données". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2017. http://www.theses.fr/2017ESMA0001/document.
Texto completoWe are witnessing an era when any company is strongly interested in collecting and analyzing data from heterogeneous and varied sources. These sources also have another specificity, namely con- text awareness. Three complementary problems are identified: the resolution of the heterogeneity of the sources, (ii) the construction of a decisional integrating system, and (iii) taking into account the context in this integration. To solve these problems, we are interested in this thesis in the design of contextual applications based on a domain ontology.To do this, we first propose a context model that integrates the main dimensions identified in the literature. Once built, it is linked to the ontology model. This approach increases flexibility in the design of advanced applications. Then, we propose two case studies: (1) the contextualization of semantic data sources where we extend the OntoBD/OntoQL system to take the context into account, and (2) the design of a contextual data warehouse where the context model is projected on the different phases of the life cycle design. To validate our proposal, we present a tool implementing the different phases of the proposed design approach
Favre, Cécile. "Évolution de schémas dans les entrepôts de données : mise à jour de hiérarchies de dimension pour la personnalisation des analyses". Phd thesis, Université Lumière - Lyon II, 2007. http://tel.archives-ouvertes.fr/tel-00269037.
Texto completoHachicha, Marouane. "Modélisation de hiérarchies complexes dans les entrepôts de données XML et traitement des problèmes d'additivité dans l'analyse en ligne XOLAP". Thesis, Lyon 2, 2012. http://www.theses.fr/2012LYO22016/document.
Texto completoSince its inception in 1998, the eXtensible Markup Language (XML) has emerged as a standard for data representation and exchange over the Internet. XML provides an opportunity for modeling data structures that are not easily represented in relational systems. In this context, XML data warehouses nowadays form the basis of several decision-support applications exploiting heterogeneous data (little structured and coming from various sources) bearing complex structures, such as complex hierarchies. In this thesis, we propose a novel XOLAP (XML-OLAP) approach that automatically detects and processes summarizability issues at query time, without requiring any particular expertise from the user. Thus, at the logical level, we choose XML data trees, so-called multidimensional data trees, to model the multidimensional structures (facts, dimensions, measures and complex hierarchies) of XML data warehouses. In order to query multidimensional data trees, we model user queries as XML pattern trees. Then, we introduce a new aggregation algorithm to address summarizability issues in complex hierarchies. On the basis of this algorithm, we propose a novel XOLAP roll-up operator. Finally, we experimentally validate our proposal and compare our approach with the reference approach for addressing summarizability issues in complex hierarchies. For this sake, we extend the XML warehouse benchmark XWeB with complex hierarchies to generate XML data warehouses with scalable complex hierarchies. The results of our experiments show that the overhead induced by managing hierarchy complexity at run-time is totally acceptable and that our approach is expected to scale up well