Дисертації з теми "Data Warehousing and Online Analytical Processing"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Data Warehousing and Online Analytical Processing.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-33 дисертацій для дослідження на тему "Data Warehousing and Online Analytical Processing".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Koylu, Caglar. "A Case Study In Weather Pattern Searching Using A Spatial Data Warehouse Model." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/2/12609573/index.pdf.

Повний текст джерела
Анотація:
Data warehousing and Online Analytical Processing (OLAP) technology has been used to access, visualize and analyze multidimensional, aggregated, and summarized data. Large part of data contains spatial components. Thus, these spatial components convey valuable information and must be included in exploration and analysis phases of a spatial decision support system (SDSS). On the other hand, Geographic Information Systems (GISs) provide a wide range of tools to analyze spatial phenomena and therefore must be included in the analysis phases of a decision support system (DSS). In this regard, this study aims to search for answers to the problem how to design a spatially enabled data warehouse architecture in order to support spatio-temporal data analysis and exploration of multidimensional data. Consequently, in this study, the concepts of OLAP and GISs are synthesized in an integrated fashion to maximize the benefits generated from the strengths of both systems by building a spatial data warehouse model. In this context, a multidimensional spatio-temporal data model is proposed as a result of this synthesis. This model addresses the integration problem of spatial, non-spatial and temporal data and facilitates spatial data exploration and analysis. The model is evaluated by implementing a case study in weather pattern searching.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Reil, Michael J., T. George Bartlett, and Kevin Henry. "REAL TIME DATA WAREHOUSING AND ON LINE ANALYTICAL PROCESSING AT ABERDEEN TEST CENTER." International Foundation for Telemetering, 2006. http://hdl.handle.net/10150/604257.

Повний текст джерела
Анотація:
ITC/USA 2006 Conference Proceedings / The Forty-Second Annual International Telemetering Conference and Technical Exhibition / October 23-26, 2006 / Town and Country Resort & Convention Center, San Diego, California
This paper is a follow on to a paper presented at the 2005 International Telemetry Conference by Dr. Samuel Harley et. al., titled Data, Information, and Knowledge Management. This paper will describe new techniques and provide further detail into the inner workings of the VISION (Versatile Information System – Integrated, Online) Engineering Performance Data Mart.
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Leonardi, Luca <1983&gt. "A framework for trajectory data warehousing and visual OLAP analysis." Doctoral thesis, Università Ca' Foscari Venezia, 2012. http://hdl.handle.net/10579/1237.

Повний текст джерела
Анотація:
Questo lavoro di tesi vuole definire un framework per un data warehouse di traiettorie, ovvero un data warehouse capace di contenere informazioni aggregate relative a traiettorie di oggetti, e che offra operazioni OLAP visuali per la loro analisi. Il modello include sia una dimensione temporale che una spaziale, che ne assicurano flessibilità, rendendolo in grado di gestire sia oggetti che si muovono liberamente o seguendo dei vincoli. Queste dimensioni, e le gerarchie associate, riflettono la struttura degli oggetti stessi e dell’ambiente in cui si muovono. Il framework proposto include anche una interfaccia visuale che permette di navigare in modo semplice tra le misure aggregate attraverso query OLAP a differenti granularità. Per evidenziare l’utilità del sistema proposto, proporremo due casi di studio, che si differenziano per il tipo di oggetti studiati, le informazioni disponibili su di loro e al tipo di vincoli sui loro movimenti.
This thesis is aimed at designing a formal framework for modelling a Trajectory Data Warehouse, namely a data warehouse able to store aggregate information related to trajectories of moving objects, which also offers visual OLAP operations for data analysis. The TDW model includes both a temporal and a spatial dimensions, that ensure flexibility, making our model general enough to deal with objects that are either completely free or constrained in their movements. This dimensions and associated hierarchies reflect the structure of the objects and of the environment in which they travel. Our framework also includes a visual interface for easily navigating aggregate measures obtained from OLAP queries at different granularities. To highlight the usefulness of the framework, we propose two different case studies, differing for the type of the observed moving objects, the available information about the objects, and their movement constraints.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Ives, Zachary G. "Efficient query processing for data integration /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6864.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Marquardt, Justus. "Metadatendesign zur Integration von Online Analytical Processing in das Wissensmanagement /." Hamburg : Kovač, 2008. http://www.verlagdrkovac.de/978-3-8300-3598-5.htm.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Marquardt, Justus. "Metadatendesign zur Integration von online analytical processing in das Wissensmanagement." Hamburg Kovač, 2007. http://www.verlagdrkovac.de/978-3-8300-3598-5.htm.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Schwarz, Holger. "Integration von Data-Mining und online analytical processing : eine Analyse von Datenschemata, Systemarchitekturen und Optimierungsstrategien /." [S.l. : s.n.], 2003. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB10720634.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Westerlund, Elisabeth, and Hanna Persson. "Implementation of Business Intelligence Systems : A study of possibilities and difficulties in small IT-enterprises." Thesis, Uppsala universitet, Företagsekonomiska institutionen, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-255915.

Повний текст джерела
Анотація:
This thesis is written at the department of Business Studies at Uppsala University. The study addresses the differences in possibilities and difficulties of implementing business intelligence (BI)-systems among small IT-enterprises. BI-systems support enterprises in decision-making. To answer the aim of this thesis, theories regarding organizational factors determining a successful implementation of a BI-system were used. Theories regarding components of BI- systems, data warehouse (DW) and online analytical processing (OLAP) were also used. These components enable the decision-support provided by a BI-system. A qualitative study was performed, at four different IT-enterprises, to gather the empirical material. Interviews were performed with CEOs and additional employees at the enterprises. After the empirical material was gathered an analysis was performed to draw conclusion regarding the research topic. The study has concluded that there are differences in possibilities and difficulties of implementing BI-systems among small IT-enterprises. A difference among the enterprises is the perceived ability to finance an implementation. Another difference is in the managerial- and organizational support of an implementation, but also in the business need of using a BI- system in decision-making. There are also differences in how the enterprises use a DW. Not all enterprises benefits from the ability of a DW to manage complex and large amounts of data, neither from the advanced analysis performed by OLAP. The enterprises thus need to examine further if the use of a BI-system is beneficial and would be used successfully in their company.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Techaplahetvanich, Kesaraporn. "A visualization framework for exploring correlations among atributes of a large dataset and its applications in data mining." University of Western Australia. School of Computer Science and Software Engineering, 2007. http://theses.library.uwa.edu.au/adt-WU2007.0216.

Повний текст джерела
Анотація:
[Truncated abstract] Many databases in scientific and business applications have grown exponentially in size in recent years. Accessing and using databases is no longer a specialized activity as more and more ordinary users without any specialized knowledge are trying to gain information from databases. Both expert and ordinary users face significant challenges in understanding the information stored in databases. The databases are so large in most cases that it is impossible to gain useful information by inspecting data tables, which are the most common form of storing data in relational databases. Visualization has emerged as one of the most important techniques for exploring data stored in large databases. Appropriate visualization techniques can reveal trends, correlations and associations in data that are very difficult to understand from a textual representation of the data. This thesis presents several new frameworks for data visualization and visual data mining. The first technique, VisEx, is useful for visual exploration of large multi-attribute datasets and especially for exploring the correlations among the attributes in such datasets. Most previous visualization techniques can display correlations among two or three attributes at a time without excessive screen clutter. ... Although many algorithms for mining association rules have been researched extensively, they do not incorporate users in the process and most of them generate a large number of association rules. It is quite often difficult for the user to analyze a large number of rules to identify a small subset of rules that is of importance to the user. In this thesis I present a framework for the user to interactively mine association rules visually. Another challenging task in data mining is to understand the correlations among the mined association rules. It is often difficult to identify a relevant subset of association rules from a large number of mined rules. A further contribution of this thesis is a simple framework in the VisAR system that allows the user to explore a large number of association rules visually. A variety of businesses have adopted new technologies for storing large amounts of data. Analysis of historical data quite often offers new insights into business processes that may increase productivity and profit. On-line analytical processing (OLAP) has become a powerful tool for business analysts to explore historical data. Effective visualization techniques are very important for supporting OLAP technology. A new technique for the visual exploration of OLAP data cubes is also presented in this thesis.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Jakawat, Wararat. "Graphs enriched by Cubes (GreC) : a new approach for OLAP on information networks." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2087/document.

Повний текст джерела
Анотація:
L'analyse en ligne OLAP (Online Analytical Processing) est une des technologies les plus importantes dans les entrepôts de données, elle permet l'analyse multidimensionnelle de données. Cela correspond à un outil d'analyse puissant, tout en étant flexible en terme d'utilisation pour naviguer dans les données, plus ou moins en profondeur. OLAP a été le sujet de différentes améliorations et extensions, avec sans cesse de nouveaux problèmes en lien avec le domaine et les données, par exemple le multimedia, les données spatiales, les données séquentielles, etc. A l'origine, OLAP a été introduit pour analyser des données structurées que l'on peut qualifier de classiques. Cependant, l'émergence des réseaux d'information induit alors un nouveau domaine intéressant qu'il convient d'explorer. Extraire des connaissances à partir de larges réseaux constitue une tâche complexe et non évidente. Ainsi, l'analyse OLAP peut être une bonne alternative pour observer les données avec certains points de vue. Différents types de réseaux d'information peuvent aider les utilisateurs dans différentes activités, en fonction de différents domaines. Ici, nous focalisons notre attention sur les réseaux d'informations bibliographiques construits à partir des bases de données bibliographiques. Ces données permettent d'analyser non seulement la production scientifique, mais également les collaborations entre auteurs. Il existe différents travaux qui proposent d'avoir recours aux technologies OLAP pour les réseaux d'information, nommé ``graph OLAP". Beaucoup de techniques se basent sur ce qu'on peut appeler cube de graphes. Dans cette thèse, nous proposons une nouvelle approche de “graph OLAP” que nous appelons “Graphes enrichis par des Cubes” (GreC). Notre proposition consiste à enrichir les graphes avec des cubes plutôt que de construire des cubes de graphes. En effet, les noeuds et/ou les arêtes du réseau considéré sont décrits par des cubes de données. Cela permet des analyses intéressantes pour l'utilisateur qui peut naviguer au sein d'un graphe enrichi de cubes selon différents niveaux d'analyse, avec des opérateurs dédiés. En outre, notons quatre principaux aspects dans GreC. Premièrement, GreC considère la structure du réseau afin de permettre des opérations OLAP topologiques, et pas seulement des opérations OLAP classiques et informationnelles. Deuxièmement, GreC propose une vision globale du graphe avec des informations multidimensionnelles. Troisièmement, le problème de dimension à évolution lente est pris en charge dans le cadre de l'exploration du réseau. Quatrièmement, et dernièrement, GreC permet l'analyse de données avec une évolution du réseau parce que notre approche permet d'observer la dynamique à travers la dimension temporelle qui peut être présente dans les cubes pour la description des noeuds et/ou arêtes. Pour évaluer GreC, nous avons implémenté notre approche et mené une étude expérimentale sur des jeux de données réelles pour montrer l'intérêt de notre approche. L'approche GreC comprend différents algorithmes. Nous avons validé de manière expérimentale la pertinence de nos algorithmes et montrons leurs performances
Online Analytical Processing (OLAP) is one of the most important technologies in data warehouse systems, which enables multidimensional analysis of data. It represents a very powerful and flexible analysis tool to manage within the data deeply by operating computation. OLAP has been the subject of improvements and extensions across the board with every new problem concerning domain and data; for instance, multimedia, spatial data, sequence data and etc. Basically, OLAP was introduced to analyze classical structured data. However, information networks are yet another interesting domain. Extracting knowledge inside large networks is a complex task and too big to be comprehensive. Therefore, OLAP analysis could be a good idea to look at a more compressed view. Many kinds of information networks can help users with various activities according to different domains. In this scenario, we further consider bibliographic networks formed on the bibliographic databases. This data allows analyzing not only the productions but also the collaborations between authors. There are research works and proposals that try to use OLAP technologies for information networks and it is called Graph OLAP. Many Graph OLAP techniques are based on a cube of graphs.In this thesis, we propose a new approach for Graph OLAP that is graphs enriched by cubes (GreC). In a different and complementary way, our proposal consists in enriching graphs with cubes. Indeed, the nodes or/and edges of the considered network are described by a cube. It allows interesting analyzes for the user who can navigate within a graph enriched by cubes according to different granularity levels, with dedicated operators. In addition, there are four main aspects in GreC. First, GreC takes into account the structure of network in order to do topological OLAP operations and not only classical or informational OLAP operations. Second, GreC has a global view of a network considered with multidimensional information. Third, the slowly changing dimension problem is taken into account in order to explore a network. Lastly, GreC allows data analysis for the evolution of a network because our approach allows observing the evolution through the time dimensions in the cubes.To evaluate GreC, we implemented our approach and performed an experimental study on a real bibliographic dataset to show the interest of our proposal. GreC approach includes different algorithms. Therefore, we also validated the relevance and the performances of our algorithms experimentally
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Hayes, Timothy. "Novel vector architectures for data management." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/397645.

Повний текст джерела
Анотація:
As the rate of annual data generation grows exponentially, there is a demand to manage, query and summarise vast amounts of information quickly. In the past, frequency scaling was relied upon to push application throughput. Today, Dennard scaling has ceased, and further performance must come from exploiting parallelism. Vector architectures offer a highly efficient and scalable way of exploiting data-level parallelism (DLP) through sophisticated single instruction-multiple data (SIMD) instruction sets. Traditionally, vector machines were used to accelerate scientific workloads rather than business-domain applications. In this thesis, we design innovative vector extensions for a modern superscalar microarchitecture that are optimised for data management workloads. Based on extensive analysis of these workloads, we propose new algorithms, novel instructions and microarchitectural optimisations. We first profile a leading commercial decision support system to better understand where the execution time is spent. We find that the hash join operator is responsible for a significant portion of the time. Based on our profiling, we develop lightweight integer-based pipelined vector extensions to capture the DLP in the operator. We then proceed to implement and evaluate these extensions using a custom simulation framework based on PTLsim and DRAMSim2. We motivate key design decisions based on the structure of the algorithm and compare these choices against alternatives experimentally. We discover that relaxing the base architecture's memory model is very beneficial when executing a vectorised implementation of the algorithm. This relaxed model serves as a powerful mechanism to execute indexed vector memory instructions out of order without requiring complex associative hardware. We find that our vectorised implementation shows good speedups. Furthermore, the vectorised version exhibits better scalability compared to the original scalar version run on a microarchitecture with larger superscalar and out-of-order structures. We then make a detailed study of SIMD sorting algorithms. Using our simulation framework, we evaluate the strengths, weaknesses and scalability of three diverse vectorised sorting algorithms- quicksort, bitonic mergesort and radix sort. We find that each of these algorithms has its unique set of bottlenecks. Based on these findings, we propose VSR sort- a novel vectorised non-comparative sorting algorithm that is based on radix sort but without its drawbacks. VSR sort, however, cannot be implemented directly with typical vector instructions due to the irregularity of its DLP. To facilitate the implementation of this algorithm, we define two new vector instructions and propose a complementary hardware structure for their execution. We find that VSR sort significantly outperforms each of the other vectorised algorithms. Next, we propose and evaluate five different ways of vectorising GROUP BY data aggregations. We find that although data aggregation algorithms are abundant in DLP, it is often too irregular to be expressed efficiently using typical vector instructions. By extending the hardware used for VSR sort, we propose a set of vector instructions and novel algorithms to better capture this irregular DLP. Furthermore, we discover that the best algorithm is highly dependent on the characteristics of the input. Finally, we evaluate the area, energy and power of these extensions using McPAT. Our results show that our proposed vector extensions come with a modest area overhead, even when using a large maximum vector length with lockstepped parallel lanes. Using sorting as a case study, we find that all of the vectorised algorithms consume much less energy than their scalar counterpart. In particular, our novel VSR sort requires an order of magnitude less energy than the scalar baseline. With respect to power, we discover that our vector extensions present a very reasonable increase in wattage.
El crecimiento exponencial de la ratio de creación de datos anual conlleva asociada una demanda para gestionar, consultar y resumir cantidades enormes de información rápidamente. En el pasado, se confiaba en el escalado de la frecuencia de los procesadores para incrementar el rendimiento. Hoy en día los incrementos de rendimiento deben conseguirse mediante la explotación de paralelismo. Las arquitecturas vectoriales ofrecen una manera muy eficiente y escalable de explotar el paralelismo a nivel de datos (DLP, por sus siglas en inglés) a través de sofisticados conjuntos de instrucciones "Single Instruction-Multiple Data" (SIMD). Tradicionalmente, las máquinas vectoriales se usaban para acelerar aplicaciones científicas y no de negocios. En esta tesis diseñamos extensiones vectoriales innovadoras para una microarquitectura superescalar moderna, optimizadas para tareas de gestión de datos. Basándonos en un extenso análisis de estas aplicaciones, también proponemos nuevos algoritmos, instrucciones novedosas y optimizaciones en la microarquitectura. Primero, caracterizamos un sistema comercial de soporte de decisiones. Encontramos que el operador "hash join" es responsable de una porción significativa del tiempo. Basándonos en nuestra caracterización, desarrollamos extensiones vectoriales ligeras para datos enteros, con el objetivo de capturar el paralelismo en este operandos. Entonces implementos y evaluamos estas extensiones usando un simulador especialmente adaptado por nosotros, basado en PTLsim y DRAMSim2. Descubrimos que relajar el modelo de memoria de la arquitectura base es altamente beneficioso, permitiendo ejecutar instrucciones vectoriales de memoria indexadas, fuera de orden, sin necesitar hardware asociativo complejo. Encontramos que nuestra implementación vectorial consigue buenos incrementos de rendimiento. Seguimos con la realización de un estudio detallado de algoritmos de ordenación SIMD. Usando nuestra infraestructura de simulación, evaluamos los puntos fuertes y débiles así como la escalabilidad de tres algoritmos vectorizados de ordenación diferentes quicksort, bitonic mergesort y radix sort. A partir de este análisis, proponemos "VSR sort" un nuevo algoritmo de ordenación vectorizado, basado en radix sort pero sin sus limitaciones. Sin embargo, VSR sort no puede ser implementado directamente con instrucciones vectoriales típicas, debido a la irregularidad de su DLP. Para facilitar la implementación de este algoritmo, definimos dos nuevas instrucciones vectoriales y proponemos una estructura hardware correspondiente. VSR sort consigue un rendimiento significativamente más alto que los otros algoritmos. A continuación, proponemos y evaluamos cinco maneras diferentes de vectorizar agregaciones de datos "GROUP BY". Encontramos que, aunque los algoritmos de agregación de datos tienen DLP abundante, frecuentemente este es demasiado irregular para ser expresado eficientemente usando instrucciones vectoriales típicas. Mediante la extensión del hardware usado para VSR sort, proponemos un conjunto de instrucciones vectoriales y algoritmos para capturar mejor este DLP irregular. Finalmente, evaluamos el área, energía y potencia de estas extensiones usando McPAT. Nuestros resultados muestran que las extensiones vectoriales propuestas conllevan un aumento modesto del área del procesador, incluso cuando se utiliza una longitud vectorial larga con varias líneas de ejecución vectorial paralelas. Escogiendo los algoritmos de ordenación como caso de estudio, encontramos que todos los algoritmos vectorizados consumen mucha menos energía que una implementación escalar. En particular, nuestro nuevo algoritmo VSR sort requiere un orden de magnitud menos de energía que el algoritmo escalar de referencia. Respecto a la potencia disipada, descubrimos que nuestras extensiones vectoriales presentan un incremento muy razonable
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Mašek, Martin. "Datové sklady - principy, metody návrhu, nástroje, aplikace, návrh konkrétního řešení." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-10145.

Повний текст джерела
Анотація:
The main goal of this thesis is to summarize and introduce general theoretical concepts of Data Warehousing by using the systems approach. The thesis defines Data Warehousing and its main areas and delimitates Data Warehousing area in terms of higher-level area called Business Intelligence. It also describes the history of Data Warehousing & Business Intelligence, focuses on key principals of Data Warehouse building and explains the practical applications of this solution. The aim of the practical part is to perform the evaluation of theoretical concepts. Based on that, design and build Data Warehouse in environment of an existing company. The final solution shall include Data Warehouse design, hardware and software platform selection, loading with real data by using ETL services and building of end users reports. The objective of the practical part is also to demonstrate the power of this technology and shall contribute to business decision-making process in this company.
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Podsedník, Lukáš. "Klient pro zobrazování OLAP kostek." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-235555.

Повний текст джерела
Анотація:
At the beginning, the project describes basics and utilization of data warehousing and OLAP techniques and operations used within the data warehouses. Then follows a description of one of the commercial OLAP client - based on the features of this product the requirement analysis of the freeware OLAP cube client displayer is desribed - choosing the functionality to be implemented in the client. Using the requirement analysis the structural design of the application (including UML diagrams) is made. The best solution from compared libraries, frameworks and development environments is chosen for the design. Next chapter is about implementation and tools and frameworks used in implemetation. At the end the thesis clasifies the reached results and options for further improvement.
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Pohl, Ondřej. "Analýza veřejně dostupných dat Českého statistického úřadu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363884.

Повний текст джерела
Анотація:
The aim of this thesis is analysis of data of the Czech Statistical Office concerning foreign trade. At first, reader familiarize with Business Intelligence and data warehousing. Further, OLAP analysis and data mining basics are explained. In next parts the thesis deal with describing and analysis of data of foreign trade by the help of OLAP technology and data mining in MS SQL Server including selected analytical tasks implementation.
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Arres, Billel. "Optimisation des performances dans les entrepôts distribués avec Mapreduce : traitement des problèmes de partionnement et de distribution des données." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2012.

Повний текст джерела
Анотація:
Dans ce travail de thèse, nous abordons les problèmes liés au partitionnement et à la distribution des grands volumes d’entrepôts de données distribués avec Mapreduce. Dans un premier temps, nous abordons le problème de la distribution des données. Dans ce cas, nous proposons une stratégie d’optimisation du placement des données, basée sur le principe de la colocalisation. L’objectif est d’optimiser les traitements lors de l’exécution des requêtes d’analyse à travers la définition d’un schéma de distribution intentionnelle des données permettant de réduire la quantité des données transférées entre les noeuds lors des traitements, plus précisément lors phase de tri (shuffle). Nous proposons dans un second temps une nouvelle démarche pour améliorer les performances du framework Hadoop, qui est l’implémentation standard du paradigme Mapreduce. Celle-ci se base sur deux principales techniques d’optimisation. La première consiste en un pré-partitionnement vertical des données entreposées, réduisant ainsi le nombre de colonnes dans chaque fragment. Ce partitionnement sera complété par la suite par un autre partitionnement d’Hadoop, qui est horizontal, appliqué par défaut. L’objectif dans ce cas est d’améliorer l’accès aux données à travers la réduction de la taille des différents blocs de données. La seconde technique permet, en capturant les affinités entre les attributs d’une charge de requêtes et ceux de l’entrepôt, de définir un placement efficace de ces blocs de données à travers les noeuds qui composent le cluster. Notre troisième proposition traite le problème de l’impact du changement de la charge de requêtes sur la stratégie de distribution des données. Du moment que cette dernière dépend étroitement des affinités des attributs des requêtes et de l’entrepôt. Nous avons proposé, à cet effet, une approche dynamique qui permet de prendre en considération les nouvelles requêtes d’analyse qui parviennent au système. Pour pouvoir intégrer l’aspect de "dynamicité", nous avons utilisé un système multi-agents (SMA) pour la gestion automatique et autonome des données entreposées, et cela, à travers la redéfinition des nouveaux schémas de distribution et de la redistribution des blocs de données. Enfin, pour valider nos contributions nous avons conduit un ensemble d’expérimentations pour évaluer nos différentes approches proposées dans ce manuscrit. Nous étudions l’impact du partitionnement et la distribution intentionnelle sur le chargement des données, l’exécution des requêtes d’analyses, la construction de cubes OLAP, ainsi que l’équilibrage de la charge (Load Balacing). Nous avons également défini un modèle de coût qui nous a permis d’évaluer et de valider la stratégie de partitionnement proposée dans ce travail
In this manuscript, we addressed the problems of data partitioning and distribution for large scale data warehouses distributed with MapReduce. First, we address the problem of data distribution. In this case, we propose a strategy to optimize data placement on distributed systems, based on the collocation principle. The objective is to optimize queries performances through the definition of an intentional data distribution schema of data to reduce the amount of data transferred between nodes during treatments, specifically during MapReduce’s shuffling phase. Secondly, we propose a new approach to improve data partitioning and placement in distributed file systems, especially Hadoop-based systems, which is the standard implementation of the MapReduce paradigm. The aim is to overcome the default data partitioning and placement policies which does not take any relational data characteristics into account. Our proposal proceeds according to two steps. Based on queries workload, it defines an efficient partitioning schema. After that, the system defines a data distribution schema that meets the best user’s needs, and this, by collocating data blocks on the same or closest nodes. The objective in this case is to optimize queries execution and parallel processing performances, by improving data access. Our third proposal addresses the problem of the workload dynamicity, since users analytical needs evolve through time. In this case, we propose the use of multi-agents systems (MAS) as an extension of our data partitioning and placement approach. Through autonomy and self-control that characterize MAS, we developed a platform that defines automatically new distribution schemas, as new queries appends to the system, and apply a data rebalancing according to this new schema. This allows offloading the system administrator of the burden of managing load balance, besides improving queries performances by adopting careful data partitioning and placement policies. Finally, to validate our contributions we conduct a set of experiments to evaluate our different approaches proposed in this manuscript. We study the impact of an intentional data partitioning and distribution on data warehouse loading phase, the execution of analytical queries, OLAP cubes construction, as well as load balancing. We also defined a cost model that allowed us to evaluate and validate the partitioning strategy proposed in this work
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Ponelis, S. R. (Shana Rachel). "Data marts as management information delivery mechanisms: utilisation in manufacturing organisations with third party distribution." Thesis, University of Pretoria, 2002. http://hdl.handle.net/2263/27061.

Повний текст джерела
Анотація:
Customer knowledge plays a vital part in organisations today, particularly in sales and marketing processes, where customers can either be channel partners or final consumers. Managing customer data and/or information across business units, departments, and functions is vital. Frequently, channel partners gather and capture data about downstream customers and consumers that organisations further upstream in the channel require to be incorporated into their information systems in order to allow for management information delivery to their users. In this study, the focus is placed on manufacturing organisations using third party distribution since the flow of information between channel partner organisations in a supply chain (in contrast to the flow of products) provides an important link between organisations and increasingly represents a source of competitive advantage in the marketplace. The purpose of this study is to determine whether there is a significant difference in the use of sales and marketing data marts as management information delivery mechanisms in manufacturing organisations in different industries, particularly the pharmaceuticals and branded consumer products. The case studies presented in this dissertation indicates that there are significant differences between the use of sales and marketing data marts in different manufacturing industries, which can be ascribed to the industry, both directly and indirectly.
Thesis (MIS(Information Science))--University of Pretoria, 2002.
Information Science
MIS
unrestricted
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Pavlová, Petra. "Měření výkonnosti podniku." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-165086.

Повний текст джерела
Анотація:
This thesis deals with the application of Business Intelligence (BI) to support the corporate performance management in ISS Europe, spol. s r. o. This company provides licences and implements original software products as well as third-party software products. First, an analysis is conducted in the given company, which then serves as basis for the implementation of the BI solution that should be interconnected with the company strategies. The main goal is the implementation of a pilot BI solution to aid the monitoring and optimisation of corporate performance. Among secondary goals are the analysis of related concepts, business strategy analysis, strategic goals and systems identification and the proposition and implementation of a pilot BI solution. In its theoretical part, this thesis focuses on the analysis of concepts related to corporate performance and BI implementations and shortly describes the company together with its business strategy. The following practical part is based on the theoretical findings. An analysis of the company is carried out using the Balanced Scorecard (BSC) methodology, the result of which is depicted in a strategic map. This methodology is then supplemented by the Activity Based Costing (ABC) analytical method, which divides expenses according to assets. The results are informational data about which expenses are linked to handling individual developmental, implementational and operational demands for particular contracts. This is followed by an original proposition and the implementation of a BI solution which includes the creation of a Data Warehouse (DWH), designing Extract Transform and Load (ETL) and Online Analytical Processing (OLAP) systems and generating sample reports. The main contribution of this thesis is in providing the company management with an analysis of company data using a multidimensional perspective which can be used as basis for prompt and correct decision-making, realistic planning and performance and product optimisation.
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Madron, Lukáš. "Datové sklady a OLAP v prostředí MS SQL Serveru." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235916.

Повний текст джерела
Анотація:
This paper deals with data warehouses and OLAP. These technologies are defined and described here. Then an introduction of the architecture of product MS SQL Server and its tools for work with data warehouses and OLAP folow. The knowledge gained is used for creation of sample application.
Стилі APA, Harvard, Vancouver, ISO та ін.
19

Cho, Moonjung. "Novel techniques for data warehousing and online analytical processing in emerging applications." 2006. http://proquest.umi.com/pqdweb?did=1225157711&sid=7&Fmt=2&clientId=39334&RQT=309&VName=PQD.

Повний текст джерела
Анотація:
Thesis (Ph.D.)--State University of New York at Buffalo, 2006.
Title from PDF title page (viewed on April. 25, 2007) Available through UMI ProQuest Digital Dissertations. Thesis adviser: Pei, Jian, He, Xin. Includes bibliographical references.
Стилі APA, Harvard, Vancouver, ISO та ін.
20

Veiga, Hugo Alexandre Carvalheira. "A comprehensive IVR (Interactive Voice Response) analysis model using online analytical processing (OLAP) on a multidimensional data cube." Master's thesis, 2014. http://hdl.handle.net/10400.6/5839.

Повний текст джерела
Анотація:
Private Branch eXchange (PBX) is a tool indispensable in the business world. The telephone exchanges allow employees to perform internal connections between telephones, or make calls to the external network also known as Public Switched Telephone Network (PSTN). With increasing Internet usage, there is interest in understanding what services are offered. Enterprise Courier is a commercial Internet Protocol Private Branch eXchange (IP PBX) based on open source Asterisk web-based PBX software for Linux, which supports multiple protocols and services, like Interactive Voice Response (IVR). Cisco Unified Communications Manager (CUCM) or CallManager, is a software based call-processing system (IP PBX) developed by Cisco Systems. CUCM tracks all active Voice over IP (VoIP) network components; including phones, gateways, conference bridges, among others. IVR is part of the Academic Services costumer contact and ticketing of University of Beira Interior (UBI). IVR monitoring and analysis are essential for effective operation and resource management, in particular, multidimensional analysis for long-term data is necessary for comprehensive understanding of the trend, the quality of customer service and costumer experience. In this paper, we propose a new IVR analysis model for large volumes of IVR data accumulated over a long period of time. The IVRCube proposed is an analysis model using online analytical processing (OLAP) on a multidimensional data cube that provides an easy and fast way to construct a multidimensional IVR analysis system for comprehensive and detailed evaluation of long-term data. The feasibility and applicability are validated, as the proposed IVRCube analysis model is implemented and applied to Academic Services costumer contact and ticketing IVR data.
A Private Branch eXchange (PBX) é uma ferramenta indispensável no mundo dos negócios. As centrais telefónicas permitem que os funcionários realizem chamadas internas entre telefones, ou façam chamadas para a rede externa, também conhecida como Public Switched Telephone Network (PSTN). Com o aumento sistemático da utilização da Internet, há um interesse acrescido em entender quais os serviços que são oferecidos nas redes baseadas em Internet Protocol (IP). Um destes serviços é o Voice over IP (VoIP). O Enterprise Courier é um software IP PBX comercial para VoIP baseado na aplicação de código aberto Asterisk, que opera sobre Linux. O IP PBX Enterprise Courier suporta vários protocolos e serviços, por exemplo o Interactive Voice Response (IVR). O Cisco Unified Communications Manager (CUCM) também chamado de CallManager, é um sistema de processamento de chamadas IP, ou IP PBX, desenvolvido pela Cisco Systems. O CUCM permite fazer a gestão e operação de todos os componentes ativos de voz, incluindo telefones, gateways, equipamentos de conferência entre outros. Estes sistemas coexistem na rede de gestão de comunicações de voz da Universidade da Beira Interior (UBI), sendo que o sistema automatizado utilizado para o encaminhamento de chamadas dos Serviços Académicos na UBI utiliza a tecnologia IVR. Este serviço da UBI é uma das formas que os clientes da Universidade (alunos e não alunos) têm para obter informações e resolver questões de forma rápida e simples usando o telefone. Por ser um importante ponto de interface entre a universidade e a comunidade, a monitorização e análise de desempenho do IVR são essenciais para o funcionamento eficaz e gestão de recursos humanos atribuídos a este serviço, o que torna a tarefa de extrair os dados do sistema de VoIP e apresentá-los de forma a poder extrair deles informação útil à gestão, o centro deste trabalho de investigação. Para a análise dos dados, foi usada uma técnica de análise multidimensional de dados a longo prazo, necessária para uma compreensão abrangente da evolução e qualidade de serviço prestada ao cliente tendo como objetivo a melhor experiência possível por parte do cliente. Neste trabalho, propomos um novo modelo de análise de IVR para grandes volumes de dados acumulados ao longo de um extenso período de tempo. O IVRCube é um modelo de análise utilizando online analytical processing (OLAP) num cubo de dados multidimensional que fornece uma forma fácil e rápida de construir um sistema de análise multidimensional para avaliação exaustiva e pormenorizada dos dados ao longo do tempo. A viabilidade e aplicabilidade deste modelo são validadas, uma vez que o modelo de análise IVRCube proposto é implementado e aplicado ao serviço de contacto telefónico (IVR) dos Serviços Académicos da UBI.
Стилі APA, Harvard, Vancouver, ISO та ін.
21

Kroeze, J. H. (Jan Hendrik). "Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3." Thesis, 2008. http://hdl.handle.net/2263/26750.

Повний текст джерела
Анотація:
The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module.
Thesis (PhD (Information Technology))--University of Pretoria, 2008.
Information Science
unrestricted
Стилі APA, Harvard, Vancouver, ISO та ін.
22

Ribeiro, Raquel Gouveia. "Sistemas de bases de dados orientados por colunas." Master's thesis, 2013. http://hdl.handle.net/1822/28223.

Повний текст джерела
Анотація:
Dissertação de mestrado em Engenharia de Informática
Nos últimos tempos os sistemas de bases de dados orientados por colunas têm atraído muita atenção, quer por parte de investigadores e modeladores de base de dados, quer por profissionais da área, com particular interesse em aspectos relacionados com arquiteturas, desempenho dos sistemas e sua escalabilidade ou em aplicações de suporte à decisão, nomeadamente em Data Warehousing e Business Inteligence. Ao contrário dos sistema de bases de dados mais tradicionais (“orientados à linha”), neste tipo de sistemas cada coluna de uma tabela de uma base de dados é armazenada separadamente. Deste modo, em vez de se armazenar uma linha seguida de outra, todos os valores de um atributo pertencente à mesma coluna são continuamente comprimidos e armazenados num pacote um pouco mais denso. A aplicação deste tipo de sistemas de bases de dados permite, principalmente, minimizar o tempo das queries típicas de um ambiente data warehousing, que através de sistemas de bases de dados mais convencionais seriam difíceis de minimizar. Neste trabalho, além da abordagem genérica ao tema, desenvolveram-se trabalhos especificamente orientados para a sua aplicação a um caso de estudo real. As técnicas abordadas seguiram de perto a metodologia apresentada por Kimball et al. (2008), tendo-se dado particular enfâse ao modelo de representação dos dados. Após o estudo necessário ter sido realizado, este trabalho focou-se a análise da influência e utilidade dos sistemas de bases de dados orientados por colunas num sistema de data warehousing. Tendo em consideração dois sistemas de bases de dados distintos, um relacional e outro não relacional, aplicou-se um conjunto de queries típicas de um ambiente de data warehousing sobre o mesmo conjunto de dados, apontando as diferenças em nível de tempo. Desta forma, a importância dada à sua estrutura base, funcionalidades, linguagens de descrição, manipulação e controlo, sistemas de gestão, entre outros, acabaria por facilitar o processo de conversão da base de dados em questão, do seu povoamento e a própria exploração das queries no Sistema de Data Warehousing implementado.
Recently systems-oriented database columns have drawn the attention from researchers, database modelers and professionals interested in subjects such as architectures, systems performance and its scalability or decision support applications, including Data Warehousing and Business Intelligence. In this type of systems each table’s column of a database is stored separately, unlike the traditional system databases (“line oriented”). Thus instead of storing one line after another, all the values of an attribute belonging to the same column are continuously compressed and stored in a slightly more dense package. The application of this type of database systems enables to reduce the time of queries normally used in a data warehousing environment. This would be unlikely to achieve in conventional database systems. In this project besides the generic approach to the subject applied their approach to a real case study. The techniques used in this project follow Kimball’s methodology [Kimball et al., 2008], putting the stress in the data’s model representation. After a first study has been conducted, this work focused the analysis of the influence and usefulness of database systems oriented for columns in a data warehousing system. Considering two different database systems, i.e. a relational and non-relational system, a set of queries typical of a data warehousing environment was applied on the same data set highlighting the time differences. Thus, the importance given to its basic structure, features, description languages, manipulation and control, management systems, among others, would eventually facilitate the process of converting the database and its settlement and exploration of queries in the implemented data warehousing system.
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Afonso, Jorge Miguel Dias. "Optimização de estruturas multidimensionais de dados em ambientes OLAP." Master's thesis, 2009. http://hdl.handle.net/10071/3858.

Повний текст джерела
Анотація:
A evolução dos sistemas de Data Warehousing em dimensão e utilização impôs uma agitação contínua sobre os sistemas de processamento analítico. A materialização de estruturas multidimensionais de dados é, desde à muito, vista como uma forma de optimizar o tempo de resposta às interrogações de natureza agregada. Para além da temporalidade, é necessário considerar uma outra perspectiva: o espaço necessário para armazenar todas as agregações calculadas. Na prática, o problema da selecção de estruturas multidimensionais de dados traduz-se principalmente na escolha das vistas que mais evidenciam a diminuição dos custos de manutenção e consulta, tendo em consideração os subcubos (ou cubóides) mais vantajosos para responder às interrogações dos utilizadores. A proporção da relação tempo/espaço é reconhecida como um problema NP-hard. De facto, muitos sistemas de suporte à decisão efectuam o pré-processamento das estruturas multidimensionais de dados de modo a optimizarem o tempo de resposta às consultas efectuadas pelos agentes de decisão. Contudo, a materialização integral dos subcubos é praticamente inexequível quando confrontada com a elevada dimensionalidade e cardinalidade, intrínsecas à complexidade dos sistemas multidimensionais modernos, para além das suas exigências conhecidas ao nível do tempo e do espaço. A materialização parcial representa, por outro lado, um interessante trade-off entre o espaço de armazenamento e o tempo de pré-processamento de vistas. Neste domínio são analisadas algumas técnicas para optimizar a selecção de estruturas multidimensionais de dados, denominadas “icebergue”, como resposta à reformulação do problema de materialização integral de vistas. Na sua essência, estes algoritmos calculam apenas as células agregadas das estruturas de dados que satisfazem uma determinada condição, com o objectivo de identificar os valores que farão sentido considerar nas análises de suporte à decisão, qualificando apenas as agregações com mais significado analítico e, portanto, as que devem ser materializadas. Em resultado da investigação efectuada, são analisados diferentes algoritmos de selecção de estruturas multidimensionais de dados, dando especial ênfase às lógicas de selecção icebergue. Para além da caracterização multidimensional (em tempo e espaço) das soluções propostas, são identificadas as suas vantagens mais predominantes e quais os pontos mais delicados que devem merecer especial atenção.
The Data Warehouse evolution in size and use imposed a continuous frenzy on the OLAP systems. The materialization of multidimensional data structures is, from early times, a way of improving the answering time of those systems to aggregated queries. In addition to time, it’s necessary to consider another perspective: the space required to store all the calculated aggregates. In practice, the multidimensional data structures selection problem is mostly related with views selection that mainly reveals a decrease of interrogation and maintenance costs, according the variety of cuboids more useful to answer any inquires made by users. The proportion time/space is recognized as an NP-Hard problem. In fact, many decision support systems carry out multidimensional data structures pre-computing in order to optimize the answering time of the queries made by the decision makers. However, the computation of all the cuboids in a multidimensional data structure is nearly infeasible when confronted with high dimensionality and cardinality, inherit to the complexity of modern Data Warehouse and OLAP systems (in addition to its recognized requirements of time and space). On another hand, partial materialization offers an interesting trade-off between storage space and response time for materialized views pre-computation. In this work, we discuss some partial materialization techniques for improving computation and selection of the most valuable cuboids of a multidimensional data structure, knows as “iceberg” algorithms, in response to the full materialization views selection problem. In essence, these algorithms calculate only a fraction of the cells in a multidimensional data structure whose aggregate value is above some minimum support threshold, in order to identify the aggregates that make sense reflect in a decision support analysis (this scenario allows to describe only the aggregates with more analytical meaning and, therefore, those that would be materialized). As a result of this research, different algorithms are analyzed for the views selection problem, principally the “iceberg” selecting logics. As well as the multidimensional characterization (in time and space) of the proposed solutions, this work identifies their most revealing advantages and what are the mainly fragile points that deserve special attention.
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Magaia, Lúcia Pires Torrão. "O papel dos sistemas de suporte à decisão na análise da qualidade da água." Master's thesis, 2009. http://hdl.handle.net/1822/11492.

Повний текст джерела
Анотація:
Dissertação de mestrado em Sistemas de Dados e Processamento Analítico
A garantia da qualidade da água para consumo humano constitui um elemento essencial das políticas de saúde pública. A monitorização da qualidade da água em Estações de Tratamento de Água assenta em métodos analíticos cada vez mais sofisticados e eficientes, dispondo de mecanismos que registam automaticamente valores fiáveis dos diversos parâmetros que condicionam a qualidade da água. Face ao volume de dados armazenados, é necessário dotar as instituições de meios tecnologicamente avançados que proporcionem aos seus utilizadores, não só um acesso rápido a informação de qualidade, mas também a capacidade para realizar a sua análise e visualização à medida das suas necessidades. A proposta do desenvolvimento de um sistema de suporte à decisão para a análise da qualidade da água numa Estação de Tratamento de Água, objecto desta dissertação, tem como objectivo a recolha, fornecimento de estruturas e meios para a exploração multidimensional dos dados, bem como a sua classificação e consequente geração de modelos através de mecanismos de Data Mining. Em relação aos mecanismos de Data Mining, é proposta uma abordagem para a previsão da qualidade da água, recorrendo à identificação de alguns modelos de previsão, explorando a utilização de árvores de decisão para tarefas de classificação, que permitam antever possíveis situações de risco no que diz respeito à qualidade da água, analisando assim as mais-valias da utilização de técnicas de Data Mining neste domínio de aplicação. Os resultados obtidos com as experiências realizadas permitem concluir que este estudo poderá ser bastante útil para os responsáveis da área, contribuindo assim para o desenvolvimento de ferramentas automáticas de apoio à tomada de decisão.
The guarantee of the water quality for human consumption is an essential element of the public health politics. Monitoring the quality water in the treatment stations is based on analytical methods which are more and more sophisticated and efficient using mechanisms that automatically register reliable data from the diverse parameters which influence the quality of the water. Taking into account the amount of stored data, it is essential to provide the institutions with high technological means that will give the users, not only a quick access to information of quality, but also the capacity of making an analysis, as well as a visualization regarding their needs. The proposal of developing a decision support system for the analysis of the water quality in a water treatment unit, which is the subject of this thesis, aims at collecting, providing structures and means for the multidimensional exploration of the data, as well as its classification and at the subsequent appearance of models through the mechanisms of Data Mining. In relation to the mechanisms of Data Mining, we suggest an approach for the prevision of water quality, using the identification of some models of prevision, resorting to the exploration of decision trees for the classification that allows us to foresee some possible risky situations as far as the quality of water is concerned and analysing the benefits of the usage of the Data Mining techniques in this domain. The results obtained from these experiments allow us to conclude that this study may be useful to those responsible in the area and contributing for development of tools which are essential for the decision-making.
Стилі APA, Harvard, Vancouver, ISO та ін.
25

Silva, Dário Almeno Matos da. "Inclusão de funcionalidades MapReduce em sistemas de data warehousing." Master's thesis, 2013. http://hdl.handle.net/1822/28079.

Повний текст джерела
Анотація:
Dissertação de mestrado em Engenharia Informática
Em geral, o processo de aquisição de dados nas organizações tornou-se gradualmente mais fácil. Perante a atual proliferação de dados, surgiram novas estratégias de processamento que visam a obtenção de melhores desempenhos dos processos de análise de dados. O MapReduce é um modelo de programação dedicado ao processamento de grandes conjuntos de dados e que coloca em prática muitos dos princípios da computação paralela e distribuída. Este modelo tem em vista facilitar o acesso aos sistemas paralelos e distribuídos a programadores menos experientes, de forma a que estes possam beneficiar das suas características de armazenamento e de processamento de dados. Os frameworks baseados neste modelo de programação ocupam hoje já uma posição de destaque no mercado, sobretudo no segmento dedicado à análise de dados não estruturados, tais como documentos de texto ou ficheiros log. Na prática, o problema do armazenamento das estruturas multidimensionais de dados e a capacidade de realizar cálculos “on the fly”, com tempos de execução reduzidos, constituem desafios muito importantes que têm que ser, também, encarados pelos sistemas de data warehousing modernos. Com efeito, nas últimas décadas, surgiram técnicas de otimização de desempenho para dar resposta às necessidades mais correntes dos agentes de decisão. O espaço multidimensional é tipicamente sustentado por um sistema de gestão de base de dados relacional através de um esquema em estrela. Igualmente, algumas soluções alternativas a estes sistemas, tal como a Bigtable, e o aparecimento de tecnologias de sistemas de data warehousing baseadas em MapReduce, como o Apache Hive e o Apache Pig, assumem um papel cada vez mais relevante. Nesta dissertação foram analisadas várias técnicas orientadas para a otimização do desempenho de um sistema multidimensional de dados, com base nas características de armazenamento e processamento de queries que o MapReduce nos propicia nos dias que correm. Os princípios fundamentais destas técnicas consistem numa estruturação dos dados contidos no data warehouse, de forma a facilitar a sua manutenção e usufruir de excelentes desempenhos na satisfação de queries, tendo em consideração, contudo, as limitações impostas pelo modelo de programação MapReduce. Adicionalmente, esta dissertação apresenta e descreve um processo de adaptação de uma estrutura convencional de um data warehouse para uma estrutura baseada em MapReduce, analisando os seus aspetos mais pertinentes.
In general, the data acquisition process by organizations become gradually easier. Given the current data proliferation, new processing strategies aimed at archiving better performance of data analysis processes. MapReduce is a programming dedicated to processing large data sets and puts into practice many of the principles parallel and distributed computing. This model aims to facilitate access to parallel and distributed systems to less experienced programmers, so that they can benefit from their storage characteristics and data processing. Frameworks based on this programming model today already occupy a prominent position in the market, especially in the segment devoted to the analysis of unstructured data such as text documents or log files. In practice, the problem of storage of multidimensional data structures and the ability to perform onthe fly calculations, with reduced execution time, are very important challenges that must also faced by modern data warehousing systems. Indeed, in recent decades, emerged techniques for performance optimization to meet the most common needs of the decision makers. The multidimensional space is typically supported by a relational database management system through a star schema. Also, some alterative solutions to these systems, such as Bigtable, and the emergence of data warehousing systems technologies based on MapReduce, such as Apache Hive e Apache Pig are playing an increasingly important role. This dissertation analyzed several techniques aimed at optimizing the performance of a system of multidimensional data, based on characteristics of storage and query processing in the MapReduce provide these days. The fundamental principles of these techniques consist of a structure of data in the data warehouse, in order to facilitate their management and boasts excellent performance in satisfying queries, taking account, however, the limitations imposed by the MapReduce programming model. Additionally, this dissertation introduces and describes an adaptation process of a conventional data warehouse structure for a framework based on MapReduce, analyzing its most relevant aspects.
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Schwarz, Holger [Verfasser]. "Integration von Data-Mining und online analytical processing : eine Analyse von Datenschemata, Systemarchitekturen und Optimierungsstrategien / vorgelegt von Holger Schwarz." 2003. http://d-nb.info/968816657/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Correia, João Ricardo Rodrigues. "Projeto UC-Num: desenvolvimento de uma Data Warehouse para a Universidade de Coimbra - Estágio B." Master's thesis, 2015. http://hdl.handle.net/10316/35684.

Повний текст джерела
Анотація:
Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra
Para as organizações de grande dimensão, como a Universidade de Coimbra, os Sistemas de Suporte à Decisão são de uma enorme importância para a gestão estratégica e até operacional. A análise de indicadores de gestão é uma importante tarefa em qualquer organização para as suas tomadas de decisão. Na UC essa tarefa está a ser dificultada por problemas de cálculo dos indicadores e pela demora na sua recolha. Estes problemas podem atrasar decisões importantes ou induzir em erro os stakeholders da UC. O objetivo deste estágio é desenvolver uma solução de Business Intelligence para produção de indicadores na área dos Recursos Humanos da UC. Para alcançar este objetivo, integrei uma equipa de desenvolvimento de sistemas para produção automática, visualização e monitorização de indicadores, nas mais variadas áreas da UC. O processo de desenvolvimento passa pela construção de uma Data Mart, que é um subconjunto de uma Data Warehouse, normalmente associado a uma temática específica. Com recurso ao modelo multidimensional é possível efetuar análises OLAP, devido à velocidade e eficácia proporcionada por esse modelo. As atividades desenvolvidas no âmbito deste estágio incluem a identificação de novos indicadores, reuniões com utilizadores e responsáveis, identificação de requisitos, desenho dos modelos de dados, desenvolvimento de software e planos ETL, implementação, verificação e validação da solução, bem como escrita de documentação.
Стилі APA, Harvard, Vancouver, ISO та ін.
28

Gouveia, Ivo Emanuel Ferreira. "Projeto UC-Num: desenvolvimento de uma DataWarehouse para a Universidade de Coimbra." Master's thesis, 2016. http://hdl.handle.net/10316/97344.

Повний текст джерела
Анотація:
Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra.
Para se conseguir uma boa gestão em organizações de grande dimensão, é cada vez mais importante efetuar decisões informadas, e a Universidade de Coimbra não é uma exceção. Atualmente, a recolha de indicadores de performance(KPI) é difícil, sendo também um trabalho demorado que, em muitos casos, pode atrasar as decisões de gest~ao ou at e mesmo, no pior dos casos, a informa c~ao recolhida encontrar-se j a desatualizada. Para resolver este problema, a Universidade de Coimbra lançou o projeto UC-Num, cujo principal objetivo é a construção de uma solução de business intelligence para calcular, em tempo útil e de maneira automática, os vários indicadores necessários à boa gestão da universidade. Como objetivo deste estágio, pretende-se criar um novo módulo, com indicadores da área dos Serviços de Apoio Social para adicionar aos já existentes no sistema atual. Assim, pretende-se criar uma data mart específica para a UC, realizando todos os passos típicos no desenvolvimento de um sistema de business intelligence, tais como: a recolha de indicadores, a criação de um plano de ETL e o desenvolvimento de dashboards interativos para o acesso à análise dos indicadores.
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Sêco, Milton Jorge Martins. "Projecto DW-UC – Desenvolvimento de uma Data Warehouse para a Universidade de Coimbra." Master's thesis, 2014. http://hdl.handle.net/10316/35680.

Повний текст джерела
Анотація:
Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra
Atualmente, os sistemas de suporte à decisão são peças fundamentais ao desenvolvimento e ao sucesso de inúmeras organizações por todo o mundo, tornando a procura e a necessidade destes sistemas cada vez maior. A Universidade de Coimbra, de forma a cumprir com os seus planos estratégicos e de ação, pretende uma ferramenta que permita aos seus órgãos de gestão um fácil acesso a informação que permita observar o desempenho e evolução da universidade e assim decidir ações e rumos a tomar. Dessa necessidade nasceu o projeto “DW-UC que visa a construção de uma data warehouse onde é tratada e analisada informação de várias fontes de dados existentes dentro da UC. Por sua vez essa informação é utilizada para calcular diversos indicadores de desempenho que permitam monitorizar e avaliar o desempenho da UC em diversas vertentes. Este relatório ilustra assim o trabalho realizado neste projeto, com especial ênfase na parte dos Recursos Humanos, área que me foi atribuída das várias divisões da data warehouse e análise OLAP.
Стилі APA, Harvard, Vancouver, ISO та ін.
30

Pinto, Adolfo Joaquim Santos Fonseca. "Projeto UC-Num: Desenvolvimento de uma Data Warehouse para a Universidade de Coimbra." Master's thesis, 2019. http://hdl.handle.net/10316/87923.

Повний текст джерела
Анотація:
Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
Dada a heterogeneidade e quantidade de dados existentes na Universidade de Coimbra, tornou-se necessário desenvolver um sistema que desse apoio na tomada de decisões.Este sistema, inserido no projeto UC-Num, pretende disponibilizar uma análise OLAP (Online Analytical Processing) com base numa Data Warehouse.Através de uma interface web, e com auxílio a dashboards interativas com análises de KPIs (Key Performance Indicators) este sistema pretende auxiliar na tomada de decisão em áreas específicas da universidade. O objetivo deste estágio passa por desenvolver uma componente da Data Warehouse, em conjunto com um sistema para realizar análises para a Universidade de Coimbra. Estas análises dão informação essencial ao pessoal administrativo da universidade. Aqui, os stakeholders assumem um papel de relevo tanto na concepção do indicador como na validação final do mesmo.Tais indicadores, inseridos no plano estratégico e de ação da universidade, foram alvo de uma definição inicial de maneira a perceber como se poderia fazer a recolha de dados que dessem suporte ao mesmo. Através destes indicadores e de métricas específicas é possível saber o comportamento da instituição e ainda se esta atinge os objetivos propostos.Ao estarem disponíveis na plataforma, estes indicadores resultam de um trabalho de recolha, modelação de dados, automatização do processo de recolha de dados e disponibilização numa página web. Sendo este estágio um projeto de BI (Business Inteligence), este pretende seguir o desenvolvimento típico de um projeto deste tipo de acordo com a metodologia recomendada por Ralph Kimball, um dos pioneiros no conceito de Data Warehousing.
Given the huge and diverse amount of data available in the University of Coimbra, it was proposed a system that could aid in the university's decision making.Incorporated in the UC-Num project, the purpose of this system is to provide online analytical processing based on a Data Warehouse.With a web interface plus interactive dashboards that provide analysis for key performance indicators (KPI), this system aims to provide decision making capabilities in specific areas of the university.The main goals of this internship is to develop a single component for the data warehouse plus a system that provides analysis to the university. This analysis give insightful information to the administrative personnel of the university. The stakeholders also play a huge role in the conception and final validation of the indicator(s). This indicators, inserted in the strategic plan of the university, we subjext to a initial definition in order to understand how could we gather the data to support this indicators.Through specific metrics it is possible to see how well the university is doing and if it is meeting the goals proposed internally.The final outcome for this indicators result from gathering the data, modeling it and automate the process of data collection from the sources. Ralph Kimball, one of the pioneers in Data Warehousing, proposed a well defined method of development and project management. Since this is a business intelligence project that involves Data Warehouses concepts, by following his methodology, we can insure that this project follows a tested and successful way to deal with requirements, data modeling, implementation and deployment.
Стилі APA, Harvard, Vancouver, ISO та ін.
31

Banda, Misheck. "A data management and analytic model for business intelligence applications." Diss., 2017. http://hdl.handle.net/10500/23129.

Повний текст джерела
Анотація:
Most organisations use several data management and business intelligence solutions which are on-premise and, or cloud-based to manage and analyse their constantly growing business data. Challenges faced by organisations nowadays include, but are not limited to growth limitations, big data, inadequate analytics, computing, and data storage capabilities. Although these organisations are able to generate reports and dashboards for decision-making in most cases, effective use of their business data and an appropriate business intelligence solution could achieve and retain informed decision-making and allow competitive reaction to the dynamic external environment. A data management and analytic model has been proposed on which organisations could rely for decisive guidance when planning to procure and implement a unified business intelligence solution. To achieve a sound model, literature was reviewed by extensively studying business intelligence in general, and exploring and developing various deployment models and architectures consisting of naïve, on-premise, and cloud-based which revealed their benefits and challenges. The outcome of the literature review was the development of a hybrid business intelligence model and the accompanying architecture as the main contribution to the study.In order to assess the state of business intelligence utilisation, and to validate and improve the proposed architecture, two case studies targeting users and experts were conducted using quantitative and qualitative approaches. The case studies found and established that a decision to procure and implement a successful business intelligence solution is based on a number of crucial elements, such as, applications, devices, tools, business intelligence services, data management and infrastructure. The findings further recognised that the proposed hybrid architecture is the solution for managing complex organisations with serious data challenges.
Computing
M. Sc. (Computing)
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Duarte, Ana Sofia da Silva. "Materialização à medida de vistas multidimensionais de dados." Master's thesis, 2012. http://hdl.handle.net/1822/27823.

Повний текст джерела
Анотація:
Dissertação de mestrado em Engenharia de Informática
Com o emergir da era da informação foram muitas as empresas que recorreram a data warehouses para armazenar a crescente quantidade de dados que dispõem sobre os seus negócios. Com essa evolução dos volumes de dados surge também a necessidade da sua melhor exploração para que sejam úteis de alguma forma nas avaliações e decisões sobre o negócio. Os sistemas de processamento analítico (ou OLAP – On-Line Analytical Processing) vêm dar resposta a essas necessidades de auxiliar o analista de negócio na exploração e avaliação dos dados, dotando-o de autonomia de exploração, disponibilizando-lhe uma estrutura multiperspetiva e de rápida resposta. Contudo para que o acesso a essa informação seja rápido existe a necessidade de fazer a materialização de estruturas multidimensionais com esses dados já pré-calculados, reduzindo o tempo de interrogação ao tempo de leitura da resposta e evitando o tempo de processamento de cada query. A materialização completa dos dados necessários torna-se na prática impraticável dada a volumetria de dados a que os sistemas estão sujeitos e ao tempo de processamento necessário para calcular todas as combinações possíveis. Dado que o analista do negócio é o elemento diferenciador na utilização efetiva das estruturas, ou pelo menos aquele que seleciona os dados que são consultados nessas estruturas, este trabalho propõe um conjunto de técnicas que estudam o comportamento do utilizador, de forma a perceber o seu comportamento sazonal e as vistas alvo das suas explorações, para que seja possível fazer a definição de novas estruturas contendo as vistas mais apropriadas à materialização e assim melhor satisfaçam as necessidades de exploração dos seus utilizadores. Nesta dissertação são definidas estruturas que acolhem os registos de consultas dos utilizadores e com esses dados são aplicadas técnicas de identificação de perfis de utilização e padrões de utilização, nomeadamente a definição de sessões OLAP, a aplicação de cadeias de Markov e a determinação de classes de equivalência de atributos consultados. No final deste estudo propomos a definição de uma assinatura OLAP capaz de definir o comportamento OLAP do utilizador com os elementos identificados nas técnicas estudadas e, assim, possibilitar ao administrador de sistema uma definição de reestruturação das estruturas multidimensionais “à medida” da utilização feita pelos analistas.
With the emergence of the information era many companies resorted to data warehouses to store an increasing amount of their business data. With this evolution of data volume the need to better explore this data arises in order to be somewhat useful in evaluating and making business decisions. OLAP (On-Line Analytical Processing) systems respond to the need of helping the business analyst in exploring the data by giving him the autonomy of exploration, providing him with a multi-perspective and quick answer structure. However, in order to provide quick access to this information the materialization of multi-dimensional structures with this data already calculated is required, reducing the query time to the answer reading time and avoiding the processing time of each query. The complete materialization of the required data is practically impossible due to the volume of data that the systems are subjected to and due to the processing time needed to calculate all combinations possible. Since the business analyst is the differentiating element in the effective use of these structures, this work proposes a set of techniques that study the user‟s behaviour in order to understand his seasonal behaviour and the target views of his explorations, so that it becomes possible to define new structures containing the most appropriate views for materialization and in this way better satisfying the exploration needs of its users. In this dissertation, structures that collect the query records of the users will be defined and with this data techniques of identification of user profiles and utilization patterns are applied, namely the definition of OLAP sessions, the application of Markov chains and the determination of equivalence classes of queried attributes. In the end of this study, the definition of an OLAP signature capable of defining the OLAP behaviour of the user with the elements identified in the studied techniques will be proposed and this way allowing the system administrator a definition for restructuring of the multi-dimensional structures in “size” with the use done by the analysts.
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Silva, Ricardo Manuel Arantes. "Definição e caracterização de assinaturas OLAP." Master's thesis, 2017. http://hdl.handle.net/1822/62494.

Повний текст джерела
Анотація:
Dissertação de mestrado integrado em Engenharia Informática
As assinaturas OLAP podem ser vistas como uma forma de caracterização de um dado perfil de exploração analítica. Porém, ao contrário de um perfil de exploração típico, uma assinatura OLAP não tem uma natureza estática. Uma assinatura OLAP congrega de uma forma única todos os elementos de informação recolhidos ao longo do tempo nas várias sessões de exploração OLAP desenvolvidas por um dado utilizador, caracterizando de uma forma bastante concreta esse utilizador ao longo do tempo. Num sistema OLAP as assinaturas podem ser utilizadas para traçar um perfil de exploração de dados de um dado utilizador, baseado nas queries que este coloca ao longo do tempo sobre um dado sistema de processamento analítico e dos seus hábitos e tendências de exploração. Através da análise das assinaturas OLAP podemos otimizar as estruturas multidimensionais – cubos - de um dado sistema analítico, de forma a reduzir o seu tamanho, guardando apenas informação relevante, e prever quais as operações que podem ser despoletadas a partir da ocorrência de uma dada querie. Desta forma é possível escolher a priori quais as partes do cubo que devem ser carregadas para memória ou aquelas que podem ser transferidas para a máquina do próprio utilizador. Tudo isto, para que seja possível minimizar a carga do servidor e reduzir o tráfego de dados no sistema de comunicação que suporta os processos de exploração analítica. Neste trabalho de dissertação exploraremos esta temática e definiremos um método sustentado para definição e manutenção de assinaturas OLAP.
OLAP signatures can be viewed as a way of characterizing one analytical exploration profile. However, unlike a typical scan profile, an OLAP signature does not have a static nature. An OLAP signature uniquely brings together all the information elements collected over time in the various OLAP exploration sessions developed by one user, characterizing that user in a very concrete way along the time. In an OLAP system the signatures can be used to trace a data exploration profile of a user, based on the queries that this one places along the time on an analytical processing system and its exploration habits and trends. Through the analysis of OLAP signatures we can optimize the multidimensional structures - cubes - of an analytical system, so as to reduce its size, keeping only relevant information, and predict which operations can be triggered from the occurrence of a specific query. In this way it is possible to choose primarily which parts of the cube should be loaded into memory or those that can be transferred to the user's own machine. With this we can minimize server load and reduce data traffic in the communication system that supports analytical scanning processes. In this dissertation we will explore this theme and define a sustained method for defining and maintaining OLAP signatures.
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії