Dissertations / Theses on the topic 'Multidimensional data mining'

To see the other types of publications on this topic, follow the link: Multidimensional data mining.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 31 dissertations / theses for your research on the topic 'Multidimensional data mining.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Torre, Fabrizio. "3D data visualization techniques and applications for visual multidimensional data mining." Doctoral thesis, Universita degli studi di Salerno, 2014. http://hdl.handle.net/10556/1561.

Full text
Abstract:
2012 - 2013
Despite modern technology provide new tools to measure the world around us, we are quickly generating massive amounts of high-dimensional, spatialtemporal data. In this work, I deal with two types of datasets: one in which the spatial characteristics are relatively dynamic and the data are sampled at different periods of time, and the other where many dimensions prevail, although the spatial characteristics are relatively static. The first dataset refers to a peculiar aspect of uncertainty arising from contractual relationships that regulate a project execution: the dispute management. In recent years there has been a growth in size and complexity of the projects managed by public or private organizations. This leads to increased probability of project failures, frequently due to the difficulty and the ability to achieve the objectives such as on-time delivery, cost containment, expected quality achievement. In particular, one of the most common causes of project failure is the very high degree of uncertainty that affects the expected performance of the project, especially when different stakeholders with divergent aims and goals are involved in the project...[edited by author]
XII n.s.
APA, Harvard, Vancouver, ISO, and other styles
2

Nimmagadda, Shastri Lakshman. "Ontology based data warehousing for mining of heterogeneous and multidimensional data sources." Thesis, Curtin University, 2015. http://hdl.handle.net/20.500.11937/2322.

Full text
Abstract:
Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals.
APA, Harvard, Vancouver, ISO, and other styles
3

Wu, Hao-cun, and 吳浩存. "A multidimensional data model for monitoring web usage and optimizing website topology." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B29528215.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Peterson, Angela R. "Visual data mining Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset /." Instructions for remote access, 2009. http://www.kutztown.edu/library/services/remote_access.asp.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ding, Guoxiang. "DERIVING ACTIVITY PATTERNS FROM INDIVIDUAL TRAVEL DIARY DATA: A SPATIOTEMPORAL DATA MINING APPROACH." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1236777859.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Hsin-Fang. "DATA MINING AND PATTERN DISCOVERY USING EXPLORATORY AND VISUALIZATION METHODS FOR LARGE MULTIDIMENSIONAL DATASETS." UKnowledge, 2013. http://uknowledge.uky.edu/epb_etds/4.

Full text
Abstract:
Oral health problems have been a major public health concern profoundly affecting people’s general health and quality of life. Given that oral health data is composed of several measurable dimensions including clinical measurements, socio-behavioral factors, genetic predispositions, self-reported assessments, and quality of life measures, strategies for analyzing multidimensional data are neither computationally straightforward nor efficient. Researchers face major challenges to identify tools that circumvent the processes of manually probing the data. The purpose of this dissertation is to provide applications of the proposed methodology on oral health-related data that go beyond identifying risk factors from a single dimension, and to describe large-scale datasets in a natural intuitive manner. The three specific applications focus on the utilization of 1) classification regression tree (CART) to understand the multidimensional factors associated with untreated decay in childhood, 2) network analyses and network plots to describe connectedness of concurrent co-morbid conditions for pediatric patients with autism receiving dental treatments under general anesthesia, and 3) random forests in addition to conventional adjusted main effects analyses to identify potential environmental risk factors and interactive effects for periodontitis. Compared to findings from the previous literature, the use of these innovative applications demonstrates overlapping findings as well as novel discoveries to the oral health knowledge. The results of this research not only illustrate that these data mining techniques can be used to improve the delivery of information into knowledge, but also provide new avenues for future decision making and planning for oral health-care management.
APA, Harvard, Vancouver, ISO, and other styles
7

Kucuktunc, Onur. "Result Diversification on Spatial, Multidimensional, Opinion, and Bibliographic Data." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374148621.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Foltýnová, Veronika. "Multidimenzionální analýza dat a zpracování analytického zobrazení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-376922.

Full text
Abstract:
This thesis deals with the analysis and display of multidimensional data. In the theoretical part, the issue of data mining, its tasks and techniques, and a brief explanation of the terms Business Intelligence and data warehouse are presented. The issue of databases is also described in this thesis. Subsequently, the options for displaying multidimensional data are described. At the end of the theoretical part is briefly explained the problems of optical networks and especially the terms Gigabit passive optical network and its frame, because the data from the frames of this network will be displayed by an application. In the practical part, you can find creating a source database and an application to create a OLAP cube and display multidimensional data. This application is based on the theoretical knowledge of multidimensional databases and OLAP technology.
APA, Harvard, Vancouver, ISO, and other styles
9

Nunes, Santiago Augusto. "Análise espaço-temporal de data streams multidimensionais." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-17102016-152137/.

Full text
Abstract:
Fluxos de dados são usualmente caracterizados por grandes quantidades de dados gerados continuamente em processos síncronos ou assíncronos potencialmente infinitos, em aplicações como: sistemas meteorológicos, processos industriais, tráfego de veículos, transações financeiras, redes de sensores, entre outras. Além disso, o comportamento dos dados tende a sofrer alterações significativas ao longo do tempo, definindo data streams evolutivos. Estas alterações podem significar eventos temporários (como anomalias ou eventos extremos) ou mudanças relevantes no processo de geração da stream (que resultam em alterações na distribuição dos dados). Além disso, esses conjuntos de dados podem possuir características espaciais, como a localização geográfica de sensores, que podem ser úteis no processo de análise. A detecção dessas variações de comportamento que considere os aspectos da evolução temporal, assim como as características espaciais dos dados, é relevante em alguns tipos de aplicação, como o monitoramento de eventos climáticos extremos em pesquisas na área de Agrometeorologia. Nesse contexto, esse projeto de mestrado propõe uma técnica para auxiliar a análise espaço-temporal em data streams multidimensionais que contenham informações espaciais e não espaciais. A abordagem adotada é baseada em conceitos da Teoria de Fractais, utilizados para análise de comportamento temporal, assim como técnicas para manipulação de data streams e estruturas de dados hierárquicas, visando permitir uma análise que leve em consideração os aspectos espaciais e não espaciais simultaneamente. A técnica desenvolvida foi aplicada a dados agrometeorológicos, visando identificar comportamentos distintos considerando diferentes sub-regiões definidas pelas características espaciais dos dados. Portanto, os resultados deste trabalho incluem contribuições para a área de mineração de dados e de apoio a pesquisas em Agrometeorologia.
Data streams are usually characterized by large amounts of data generated continuously in synchronous or asynchronous potentially infinite processes, in applications such as: meteorological systems, industrial processes, vehicle traffic, financial transactions, sensor networks, among others. In addition, the behavior of the data tends to change significantly over time, defining evolutionary data streams. These changes may mean temporary events (such as anomalies or extreme events) or relevant changes in the process of generating the stream (that result in changes in the distribution of the data). Furthermore, these data sets can have spatial characteristics such as geographic location of sensors, which can be useful in the analysis process. The detection of these behavioral changes considering aspects of evolution, as well as the spatial characteristics of the data, is relevant for some types of applications, such as monitoring of extreme weather events in Agrometeorology researches. In this context, this project proposes a technique to help spatio-temporal analysis in multidimensional data streams containing spatial and non-spatial information. The adopted approach is based on concepts of the Fractal Theory, used for temporal behavior analysis, as well as techniques for data streams handling also hierarchical data structures, allowing analysis tasks that take into account the spatial and non-spatial aspects simultaneously. The developed technique has been applied to agro-meteorological data to identify different behaviors considering different sub-regions defined by the spatial characteristics of the data. Therefore, results from this work include contribution to data mining area and support research in Agrometeorology.
APA, Harvard, Vancouver, ISO, and other styles
10

Nieto, Erick Mauricio Gómez. "Projeção multidimensional aplicada a visualização de resultados de busca textual." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05122012-105730/.

Full text
Abstract:
Usuários da Internet estão muito familiarizados que resultados de uma consulta sejam exibidos como uma lista ordenada de snippets. Cada snippet possui conteúdo textual que mostra um resumo do documento referido (ou página web) e um link para o mesmo. Esta representação tem muitas vantagens como, por exemplo, proporcionar uma navegação fácil e simples de interpretar. No entanto, qualquer usuário que usa motores de busca poderia reportar possivelmente alguma experiência de decepção com este modelo. Todavia, ela tem limitações em situações particulares, como o não fornecimento de uma visão geral da coleção de documentos recuperados. Além disso, dependendo da natureza da consulta - por exemplo, pode ser muito geral, ou ambígua, ou mal expressa - a informação desejada pode ser mal classificada, ou os resultados podem contemplar temas variados. Várias tarefas de busca seriam mais fáceis se fosse devolvida aos usuários uma visão geral dos documentos organizados de modo a refletir a forma como são relacionados, em relação ao conteúdo. Propomos uma técnica de visualização para exibir os resultados de consultas web que visa superar tais limitações. Ela combina a capacidade de preservação de vizinhança das projeções multidimensionais com a conhecida representação baseada em snippets. Essa visualização emprega uma projeção multidimensional para derivar layouts bidimensionais dos resultados da pesquisa, que preservam as relações de similaridade de texto, ou vizinhança. A similaridade é calculada mediante a aplicação da similaridade do cosseno sobre uma representação bag-of-words vetorial de coleções construídas a partir dos snippets. Se os snippets são exibidos diretamente de acordo com o layout derivado, eles se sobrepõem consideravelmente, produzindo uma visualização pobre. Nós superamos esse problema definindo uma energia funcional que considera tanto a sobreposição entre os snippets e a preservação da estrutura de vizinhanças como foi dada no layout da projeção. Minimizando esta energia funcional é fornecida uma representação bidimensional com preservação das vizinhanças dos snippets textuais com sobreposição mínima. A visualização transmite tanto uma visão global dos resultados da consulta como os agrupamentos visuais que refletem documentos relacionados, como é ilustrado em vários dos exemplos apresentados
Internet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document (or web page) and a link to it. This display has many advantages, e.g., it affords easy navigation and is straightforward to interpret. Nonetheless, any user of search engines could possibly report some experience of disappointment with this metaphor. Indeed, it has limitations in particular situations, as it fails to provide an overview of the document collection retrieved. Moreover, depending on the nature of the query - e.g., it may be too general, or ambiguous, or ill expressed - the desired information may be poorly ranked, or results may contemplate varied topics. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content-wise. We propose a visualization technique to display the results of web queries aimed at overcoming such limitations. It combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation by employing a multidimensional projection to derive two-dimensional layouts of the query search results that preserve text similarity relations, or neighborhoods. Similarity is computed by applying the cosine similarity over a bag-of-words vector representation of collection built from the snippets. If the snippets are displayed directly according to the derived layout they will overlap considerably, producing a poor visualization. We overcome this problem by defining an energy functional that considers both the overlapping amongst snippets and the preservation of the neighborhood structure as given in vii the projected layout. Minimizing this energy functional provides a neighborhood preserving two-dimensional arrangement of the textual snippets with minimum overlap. The resulting visualization conveys both a global view of the query results and visual groupings that reflect related results, as illustrated in several examples shown
APA, Harvard, Vancouver, ISO, and other styles
11

Ivaškevičius, Klaidas. "Daugiamačių sekų šablonų analizė." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2012~D_20140630_173416-93518.

Full text
Abstract:
Pagrindinis šio magistro baigiamojo darbo tikslas buvo apžvelgti kai kurių algoritmų ir jų kombinacijų pritaikymą daugiamačiams sekų šablonams analizuoti ir įgyvendinti algoritmą, gebantį tai atlikti. Buvo aprašyta FP-Tree medžio struktūra, kuri yra skirta kompaktiškai saugoti kritiniams (pvz., dažnai pasikartojantiems) duomenims, pateiktas FP-Growth algoritmas, galintis analizuoti tokią duomenų struktūrą ir rezultate pateikiantis visų dažnų elementų šablonų aibę. Pristatyta modifikuotų FP-Growth ir PrefixSpan algoritmų kombinacija – MD-PS-FPG algoritmas, pateikti kai kurių atliktų testavimų rezultatai, tolimesnių darbų pagrindiniai tikslai ir pan.
The main goal of this master final work was to present some of the algorithms and their combinations for the multidimensional sequence pattern mining and implement an algorithm, that is capable of doing that. FP-Tree, that is used to store critical (for example, often repeated) data, was described. FP-Growth algorithm, that can analyze FP-Tree structure and give frequent pattern set as a result, was presented. MD-PS-FPG algorithm – a combination of modified FP-Growth and PrefixSpan algorithms – was introduced. The results of some tests, further work objectives and other things were also presented.
APA, Harvard, Vancouver, ISO, and other styles
12

Nogueira, Rodrigo Ramos. "Newsminer: um sistema de data warehouse baseado em texto de notícias." Universidade Federal de São Carlos, 2017. https://repositorio.ufscar.br/handle/ufscar/9138.

Full text
Abstract:
Submitted by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:12:56Z No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5)
Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:04Z (GMT) No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5)
Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:13Z (GMT) No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5)
Made available in DSpace on 2017-10-09T14:14:24Z (GMT). No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) Previous issue date: 2017-05-12
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Data and text mining applications managing Web data have been the subject of recent research. In every case, data mining tasks need to work on clean, consistent, and integrated data for obtaining the best results. Thus, Data Warehouse environments are a valuable source of clean, integrated data for data mining applications. Data Warehouse technology has evolved to retrieve and process data from the Web. In particular, news websites are rich sources that can compose a linguistic corpus. By inserting corpus into a Data Warehousing environment, applications can take advantage of the flexibility that a multidimensional model and OLAP operations provide. Among the benefits are the navigation through the data, the selection of the part of the data considered relevant, data analysis at different levels of abstraction, and aggregation, disaggregation, rotation and filtering over any set of data. This paper presents Newsminer, a data warehouse environment, which provides a consistent and clean set of texts in the form of a multidimensional corpus for consumption by external applications and users. The proposal includes an architecture that integrates the gathering of news in real time, a semantic enrichment module as part of the ETL stage, which adds semantic properties to the data such as news category and POS-tagging annotation and the access to data cubes for consumption by applications and users. Two experiments were performed. The first experiment selects the best news classifier for the semantic enrichment module. The statistical analysis of the results indicated that the Perceptron classifier achieved the best results of F-measure, with a good result of computational time. The second experiment collected data to evaluate real-time news preprocessing. For the data set collected, the results indicated that it is possible to achieve online processing time.
As aplicações de mineração de dados e textos oriundos da Internet têm sido alvo de recentes pesquisas. E, em todos os casos, as tarefas de mineração de dados necessitam trabalhar sobre dados limpos, consistentes e integrados para obter os melhores resultados. Sendo assim, ambientes de Data Warehouse são uma valiosa fonte de dados limpos e integrados para as aplicações de mineração. A tecnologia de Data Warehouse tem evoluído no sentido de recuperar e tratar dados provenientes da Web. Em particular, os sites de notícias são fontes ricas em textos, que podem compor um corpus linguístico. Inserindo o corpus em um ambiente de Data Warehouse, as aplicações poderão tirar proveito da flexibilidade que um modelo multidimensional e as operações OLAP fornecem. Dentre as vantagens estão a navegação pelos dados, a seleção da parte dos dados considerados relevantes, a análise dos dados em diferentes níveis de abstração, e a agregação, desagregação, rotação e filtragem sobre qualquer conjunto de dados. Este trabalho apresenta o ambiente de Data Warehouse Newsminer, que fornece um conjunto de textos consistente e limpo, na forma de um corpus multidimensional para consumo por aplicações externas e usuários. A proposta inclui uma arquitetura que integra a coleta textos de notícias em tempo próximo do tempo real, um módulo de enriquecimento semântico como parte da etapa de ETL, que acrescenta propriedades semânticas aos dados coletados tais como a categoria da notícia e a anotação POS-tagging, e a disponibilização de cubos de dados para consumo por aplicações e usuários. Foram executados dois experimentos. O primeiro experimento é relacionado à escolha do melhor classificador de categorias das notícias do módulo de enriquecimento semântico. A análise estatística dos resultados indicou que o classificador Perceptron atingiu os melhores resultados de F-medida, com resultado bom de tempo de processamento. O segundo experimento coletou dados para avaliar o pré-processamento de notícias em tempo real. Para o conjunto de dados coletados, os resultados indicaram que é possível atingir tempo de processamento online.
OB800972
APA, Harvard, Vancouver, ISO, and other styles
13

Egho, Elias. "Extraction de motifs séquentiels dans des données séquentielles multidimensionnelles et hétérogènes : une application à l'analyse de trajectoires de patients." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0066/document.

Full text
Abstract:
Tous les domaines de la science et de la technologie produisent de gros volume de données hétérogènes. L'exploration de tels volumes de données reste toujours un défi. Peu de travaux ciblent l'exploration et l'analyse de données séquentielles multidimensionnelles et hétérogènes. Dans ce travail, nous proposons une contribution à la découverte de connaissances dans les données séquentielles hétérogènes. Nous étudions trois axes de recherche différents: (i) l'extraction de motifs séquentiels, (ii) la classification et (iii) le clustering des données séquentielles. Tout d'abord, nous généralisons la notion de séquence multidimensionnelle en considérant la structure complexe et hétérogène. Nous présentons une nouvelle approche MMISP pour extraire des motifs séquentiels à partir de données séquentielles multidimensionnelles et hétérogènes. MMISP génère un grand nombre de motifs séquentiels comme cela est généralement le cas pour toues les algorithmes d'énumération des motifs. Pour surmonter ce problème, nous proposons une nouvelle façon de considérer les séquences multidimensionnelles hétérogènes en les associant à des structures de patrons. Nous développons une méthode pour énumérer seulement les motifs qui respectent certaines contraintes. La deuxième direction de recherche est la classification de séquences multidimensionnelles et hétérogènes. Nous utilisons l'analyse formelle de concept (AFC) comme une méthode de classification. Nous montrons l'intérêt des treillis de concepts et de l'indice de stabilité pour classer les séquences et pour choisir quelques groupes intéressants de séquences. La troisième direction de recherche dans cette thèse est préoccupé par le regroupement des données séquentielles multidimensionnelles et hétérogènes. Nous nous basons sur la notion de sous-séquences communes pour définir une mesure de similarité permettant d'évaluer la proximité entre deux séquences formées d'une liste d'ensemble d'items. Nous utilisons cette mesure de similarité pour construire une matrice de similarité entre les séquences et pour les segmenter en plusieurs groupes. Dans ce travail, nous présentons les résultats théoriques et un algorithme de programmation dynamique permettant de compter efficacement toutes les sous-séquences communes à deux séquences sans énumérer toutes les séquences. Le système résultant de cette recherches a été appliqué pour analyser et extraire les trajectoires de soins de santé des patients en cancérologie. Les données sont issues d' une base de données médico-administrative incluant des informations sur des patients hospitalisent en France. Le système permet d'identifier et de caractériser des épisodes de soins pour des ensembles spécifiques de patients. Les résultats ont été discutés et interprétés avec les experts du domaine
All domains of science and technology produce large and heterogeneous data. Although a lot of work was done in this area, mining such data is still a challenge. No previous research work targets the mining of heterogeneous multidimensional sequential data. This thesis proposes a contribution to knowledge discovery in heterogeneous sequential data. We study three different research directions: (i) Extraction of sequential patterns, (ii) Classification and (iii) Clustering of sequential data. Firstly we generalize the notion of a multidimensional sequence by considering complex and heterogeneous sequential structure. We present a new approach called MMISP to extract sequential patterns from heterogeneous sequential data. MMISP generates a large number of sequential patterns as this is usually the case for pattern enumeration algorithms. To overcome this problem, we propose a novel way of considering heterogeneous multidimensional sequences by mapping them into pattern structures. We develop a framework for enumerating only patterns satisfying given constraints. The second research direction is in concern with the classification of heterogeneous multidimensional sequences. We use Formal Concept Analysis (FCA) as a classification method. We show interesting properties of concept lattices and of stability index to classify sequences into a concept lattice and to select some interesting groups of sequences. The third research direction in this thesis is in concern with the clustering of heterogeneous multidimensional sequential data. We focus on the notion of common subsequences to define similarity between a pair of sequences composed of a list of itemsets. We use this similarity measure to build a similarity matrix between sequences and to separate them in different groups. In this work, we present theoretical results and an efficient dynamic programming algorithm to count the number of common subsequences between two sequences without enumerating all subsequences. The system resulting from this research work was applied to analyze and mine patient healthcare trajectories in oncology. Data are taken from a medico-administrative database including all information about the hospitalizations of patients in Lorraine Region (France). The system allows to identify and characterize episodes of care for specific sets of patients. Results were discussed and validated with domain experts
APA, Harvard, Vancouver, ISO, and other styles
14

Paulovich, Fernando Vieira. "Mapeamento de dados multi-dimensionais - integrando mineração e visualização." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-04032009-145018/.

Full text
Abstract:
As técnicas de projeção ou posicionamento de pontos no plano, que servem para mapear dados multi-dimensionais em espaços visuais, sempre despertaram grande interesse da comunidade de visualização e análise de dados por representarem uma forma útil de exploração baseada em relações de similaridade e correlação. Apesar disso, muitos problemas ainda são encontrados em tais técnicas, limitando suas aplicações. Em especial, as técnicas de projeção multi-dimensional de maior qualidade têm custo computacional proibitivo para grandes conjuntos de dados. Adicionalmente, problemas referentes à escalabilidade visual, isto é, à capacidade da metáfora visual empregada de representar dados de forma compacta e amigável, são recorrentes. Esta tese trata o problema da projeção multi-dimensional de vários pontos de vista, propondo técnicas que resolvem, até certo ponto, cada um dos problemas verificados. Também é fato que a complexidade e o tamanho dos conjuntos de dados indicam que a visualização deve trabalhar em conjunto com técnicas de mineração, tanto embutidas no processo de mapeamento, como por meio de ferramentas auxiliares de interpretação. Nesta tese incorporamos alguns aspectos de mineração integrados ao processo de visualização multi-dimensional, principalmente na aplicação de projeções para visualização de coleções de documentos, propondo uma estratégia de extração de tópicos. Como suporte ao desenvolvimento e teste dessas técnicas, foram criados diferentes sistemas de software. O principal inclui as técnicas desenvolvidas e muitas das técnicas clássicas de projeção, podendo ser usado para exploração de conjuntos de dados multi-dimensionais em geral, com funcionalidade adicional para mapeamento de coleções de documentos. Como principal contribuição desta tese propomos um entendimento mais profundo dos problemas encontrados nas técnicas de projeção vigentes e o desenvolvimento de técnicas de projeção (ou mapeamento) que são rápidas, tratam adequadamente a formação visual de grupos de dados altamente similares, separam satisfatoriamente esses grupos no layout, e permitem a exploração dos dados em vários níveis de detalhe
Projection or point placement techniques, useful for mapping multidimensional data into visual spaces, have always risen interest in the visualization and data analysis communities because they can support data exploration based on similarity or correlation relations. Regardless of that interest, various problems arise when dealing with such techniques, impairing their widespread application. In particularly the projections that yield highest quality layouts have prohibitive computational cost for large data sets. Additionally, there are issues regarding visual scalability, i.e., the capability of visually fit the individual points in the exploration space as the data set grows large. This thesis treats the problems of projections from various perspectives, presenting novel techniques that solve, to certain extent, several of the verified problems. It is also a fact that size and complexity of data sets suggest the integration of data mining capabilities into the visualization pipeline, both during the mapping process and as a tools to extract additional information after the data have been layed out. This thesis also add some aspects of mining to the multidimensional visualization process, mainly for the particular application of analysis of document collections, proposing and implementing an approach for topic extraction. As supporting tools for testing these techniques and comparing them to existing ones different software systems were written. The main one includes the techniques developed here as well as several of the classical projection and dimensional reduction techniques, and can be used for exploring various kinds of data sets, with addition functionality to support the mapping of document collections. This thesis contributes to the understanding of the projection or mapping problem and develops new techniques that are fast, treat adequately the visual formation of groups of highly related data items, separate those groups properly and allow exploration of data in various levels of detail
APA, Harvard, Vancouver, ISO, and other styles
15

Salazar, Frizzi Alejandra San Roman. "Um estudo sobre o papel de medidas de similaridade em visualização de coleções de documentos." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012013-155903/.

Full text
Abstract:
Técnicas de visualização de informação, tais como as que utilizam posicionamento de pontos baseado na similaridade do conteúdo, são utilizadas para criar representações visuais de dados que evidenciem certos padrões. Essas técnicas são sensíveis à qualidade dos dados, a qual, por sua vez, depende de uma etapa de pré-processamento muito influente. Esta etapa envolve a limpeza do texto e, em alguns casos, a detecção de termos e seus pesos, bem como a definição de uma função de (dis)similaridade. Poucos são os estudos realizados sobre como esses cálculos de (dis)similaridade afetam a qualidade das representações visuais geradas para dados textuais. Este trabalho apresenta um estudo sobre o papel das diferentes medidas de (dis)similaridade entre pares de textos na geração de mapas visuais. Nos concentramos principalmente em dois tipos de funções de distância, aquelas computadas a partir da representação vetorial do texto (Vector Space Model (VSM)) e em medidas de comparação direta de strings textuais. Comparamos o efeito na geração de mapas visuais com técnicas de posicionamento de pontos, utilizando as duas abordagens. Para isso, foram utilizadas medidas objetivas para comparar a qualidade visual dos mapas, tais como Neighborhood Hit (NH) e Coeficiente de Silhueta (CS). Descobrimos que ambas as abordagens têm pontos a favor, mas de forma geral, o VSM apresentou melhores resultados quanto à discriminação de classes. Porém, a VSM convencional não é incremental, ou seja, novas adições à coleção forçam o recálculo do espaço de dados e das dissimilaridades anteriormente computadas. Nesse sentido, um novo modelo incremental baseado no VSM (Incremental Vector Space Model (iVSM)) foi considerado em nossos estudos comparativos. O iVSM apresentou os melhores resultados quantitativos e qualitativos em diversas configurações testadas. Os resultados da avaliação são apresentados e recomendações sobre a aplicação de diferentes medidas de similaridade de texto em tarefas de análise visual, são oferecidas
Information visualization techniques, such as similarity based point placement, are used for generating of visual data representation that evidence some patterns. These techniques are sensitive to data quality, which depends of a very influential preprocessing step. This step involves cleaning the text and in some cases, detecting terms and their weights, as well as definiting a (dis)similarity function. There are few studies on how these (dis)similarity calculations aect the quality of visual representations for textual data. This work presents a study on the role of the various (dis)similarity measures in generating visual maps. We focus primarily on two types of distance functions, those based on vector representations of the text (Vector Space Model (VSM)) and measures obtained from direct comparison of text strings, comparing the effect on the visual maps obtained with point placement techniques with the two approaches. For this, objective measures were employed to compare the visual quality of the generated maps, such as the Neighborhood Hit and Silhouette Coefficient. We found that both approaches have strengths, but in general, the VSM showed better results as far as class discrimination is concerned. However, the conventional VSM is not incremental, i.e., new additions to the collection force the recalculation of the data space and dissimilarities previously computed. Thus, a new model based on incremental VSM (Incremental Vector Space Model (iVSM)) has been also considered in our comparative studies. iVSM showed the best quantitative and qualitative results in several of the configurations considered. The evaluation results are presented and recommendations on the application of different similarity measures for text analysis tasks visually are provided
APA, Harvard, Vancouver, ISO, and other styles
16

Hlavička, Ladislav. "Dolování asociačních pravidel z datových skladů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-235501.

Full text
Abstract:
This thesis deals with association rules mining over data warehouses. In the first part the reader will be familiarized with terms like knowledge discovery in databases and data mining. The following part of the work deals with data warehouses. Further the association analysis, the association rules, their types and mining possibilities are described. The architecture of Microsoft SQL Server and its tools for working with data warehouses are presented. The rest of the thesis includes description and analysis of the Star-miner algorithm, design, implementation and testing of the application.
APA, Harvard, Vancouver, ISO, and other styles
17

Hader, Sören. "Data mining auf multidimensionalen und komplexen Daten in der industriellen Bildverarbeitung." Berlin Pro Business, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2984734&prov=M&dok_var=1&dok_ext=htm.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Hader, Sören. "Data mining auf multidimensionalen und komplexen Daten in der industriellen Bildverarbeitung /." Berlin : Pro Business, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=2984734&prov=M&dok_var=1&dok_ext=htm.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Bandini, Lorenzo. "Progettazione di un sistema di outlier detection per cubi multidimensionali." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/12931/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Pumprla, Ondřej. "Získávání znalostí z datových skladů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236715.

Full text
Abstract:
This Master's thesis deals with the principles of the data mining process, especially with the mining  of association rules. The theoretical apparatus of general description and principles of the data warehouse creation is set. On the basis of this theoretical knowledge, the application for the association rules mining is implemented. The application requires the data in the transactional form or the multidimensional data organized in the Star schema. The implemented algorithms for finding  of the frequent patterns are Apriori and FP-tree. The system allows the variant setting of parameters for mining process. Also, the validation tests and efficiency proofs were accomplished. From the point of view of the association rules searching support, the resultant application is more applicable and robust than the existing compared systems SAS Miner and Oracle Data Miner.
APA, Harvard, Vancouver, ISO, and other styles
21

Videla, Cavieres Iván Fernando. "Improvement of recommendation system for a wholesale store chain using advanced data mining techniques." Tesis, Universidad de Chile, 2015. http://repositorio.uchile.cl/handle/2250/133522.

Full text
Abstract:
Magíster en Gestión de Operaciones
Ingeniero Civil Industrial
En las empresas de Retail, las áreas de Customer Intelligence tienen muchas oportunidades de mejorar sus decisiones estratégicas a partir de la información que podrían obtener de los registros de interacciones con sus clientes. Sin embargo se ha convertido en un desafío poder procesar estos grandes volúmenes de datos. Uno de los problemas que se enfrentan día a día es segmentar o agrupar clientes. La mayoría de las empresas generan agrupaciones según nivel de gasto, no por similitud en sus canastas de compra, como propone la literatura. Otro desafío de estas empresas es aumentar las ventas en cada visita del cliente y fidelizar. Una de las técnicas utilizadas para lograrlo es usar sistemas de recomendación. En este trabajo se proceso ́ alrededor de medio billón de registros transaccionales de una cadena de supermercados mayorista. Al aplicar las técnicas tradicionales de Clustering y Market Basket Analysis los resultados son de baja calidad, haciendo muy difícil la interpretación, además no se logra identificar grupos que permitan clasificar a un cliente de acuerdo a sus compras históricas. Entendiendo que la presencia simultánea de dos productos en una misma boleta implica una relación entre ellos, se usó un método de graph mining basado en redes sociales que permitió obtener grupos de productos identificables que denominamos comunidades, a las que puede pertenecer un cliente. La robustez del modelo se comprueba por la estabilidad de los grupos generados en distintos periodos de tiempo. Bajo las mismas restricciones que la empresa exige, se generan recomendaciones basadas en las compras históricas y en la pertenencia de los clientes a los distintos grupos de productos. De esta manera, los clientes reciben recomendaciones mucho más pertinentes y no solo son basadas en los que otros clientes también compraron. La novedosa forma de resolver el problema de segmentar clientes ayuda a mejorar en un 140% el actual método de recomendaciones que utiliza la cadena Chilena de supermercados mayoristas. Esto se traduce en un aumento de más de 430% de los ingresos posibles.
APA, Harvard, Vancouver, ISO, and other styles
22

Andery, Gabriel de Faria. "Integrando projeções multidimensionais à analise visual de redes sociais." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-06102010-111345/.

Full text
Abstract:
Há várias décadas, pesquisadores em ciências sociais buscam formas gráficas para expressar as relações humanas na sociedade. O advento do computador e, mais recentemente, da internet, possibilitou o surgimento de um campo que tem despertado a atenção de estudiosos das áreas de visualização de informação e de ciências sociais, o da visualização de redes sociais. Esse campo tem o potencial de revelar e explorar padrões que podem beneficiar um número muito grande de aplicações e indivíduos em áreas tais como comércio, segurança em geral, redes de conhecimento e pesquisa de mercado. Grande parte dos algoritmos de visualização de redes sociais são baseados em grafos, destacando relacionamentos entre indivíduos e grupos de indivíduos, mas dando pouca atenção aos seus demais atributos. Assim, este trabalho apresenta um conjunto de soluções para representar e explorar visualmente redes sociais levando em consideração tais atributos. A primeira solução faz uso de redes heterogêneas, onde tanto indivíduos quanto comunidades são representados no grafo; a segunda solução utiliza técnicas de visualização baseadas em projeção multidimensional, que promovem o posicionamento dos dados no plano de acordo com algum critério de similaridade baseado em atributo; e a última solução coordena múltiplas visões para focar rapidamente em regiões de interesse. Os resultados indicam que as soluções proveem um poder de representação e identificação de conceitos não facilmente detectados por formas convencionais de visualização e exploração de grafos, com indícios fornecidos através dos estudos de caso e da realização de avaliações com usuários. Este trabalho fornece um estudo das áreas de visualização em grafos para a análise de redes sociais bem como uma implementação das soluções de integração da visualização em redes com as projeções multidimensionais
For decades, social sciences researchers have searched for graphical forms to express human social relationships. The development of computer science and more recently of the Internet has given rise to a new field of research for visualization and social sciences professionals, that of social network visualization. This field can potentially offer new opportunities in reveal new patterns that can benefit a large number of applications and individuals in fields such as commerce, security, knowledge networks and marketing. A large part of social network visualization algorithms and systems relies on graph representations, highlighting relationships amongst individuals and groups of individuals, but mostly neglecting the other available attributes of individuals. Thus, this work presents a set of tools to represent and explore social networks visually, taking into consideration the attributes of the nodes. The first technique employs heterogeneous networks, where both individuals and communities are represented in the graph; the second solution uses visualization techniques based on multidimensional projection, which promote the placement of data in the plane according to some similarity criterion based on attribute; still another proposed technique coordinates multiple views in order to speed up focus in regions of interest in the data sets. The results indicate that the solutions promote high degree of representation power and that concept identification not easily obtained via other methods is possible; the evidence comes from case studies as well as a user evaluation. This work includes a study in the area of graph visualization for social network analysis as well as a system implementing the proposed solutions, that integrate network visualization and multidimensional projections to extract patterns from social networks
APA, Harvard, Vancouver, ISO, and other styles
23

Paterlini, Adriano Arantes. "Imersão de espaços métricos em espaços multidimensionais para indexação de dados usando detecção de agrupamentos." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-25042011-155810/.

Full text
Abstract:
O sucesso dos Sistemas de Gerenciamento de Banco de Dados (SGBDs) em aplicações envolvendo dados tradicionais (números e textos curtos) encorajou o seu uso em novos tipos de aplicações, que exigem a manipulação de dados complexos. Séries temporais, dados científicos, dados multimídia e outros são exemplos de Dados Complexos. Inúmeras áreas de aplicação têm demandado soluções para o gerenciamento de dados complexos, dentre as quais a área de informática médica. Dados complexos podem também ser estudos com técnicas de descoberta de conhecimentos, conhecidas como KDD (Knowledge Discovery in Database), usando alguns algoritmos de detecção de agrupamentos apropriados. Entretanto, estes algoritmos possuem custo computacional elevado, o que dificulta a sua utilização em grandes conjuntos de dados. As técnicas já desenvolvidas na Área de Bases de Dados para indexação de espaços métricos usualmente consideram o conjunto de maneira uniforme sem levar em conta a existência de agrupamentos nos dados, por isso as estruturas buscam maximizar a eficiência das consultas para todo o conjunto simultaneamente. No entanto muitas vezes as consultas por similaridade estão limitadas a uma região específica do conjunto de dados. Neste contexto, esta dissertação propõe a criação de um novo método de acesso, que seja capaz de indexar de forma eficiente dados métricos, principalmente para conjuntos que contenham agrupamentos. Para atingir esse objetivo este trabalho também propõe um novo algoritmo para detecção de agrupamentos em dados métricos tornando mais eficiente a escolha do medoide de determinado conjunto de elementos. Os resultados dos experimentos mostram que os algoritmo propostos FAMES e M-FAMES podem ser utilizados para a detecção de agrupamentos em dados complexos e superam os algoritmos PAM, CLARA e CLARANS em eficácia e eficiência. Além disso, as consultas por similaridade realizadas com o método de acesso métrico proposto FAMESMAM mostraram ser especialmente apropriados para conjuntos de dados com agrupamentos
The success of Database Management System (DBMS) for applications with traditional data (numbers and short texts) has encouraged its use in new types of applications that require manipulation of complex data. Time series, scientific data and other multimedia data are examples of complex data. Several application fields, like medical informatics, have demanded solutions for managing complex data. Complex data can also be studied by means of Knowledge Discovery Techniques (KDD) applying appropriate clustering algorithms. However, these algorithms have high computational cost hindering their use in large data sets. The techniques already developed in the Databases research field for indexing metric spaces usually consider the sets have a uniform distribution, without taking into account the existence of clusters in the data, therefore the structures need to generalize the efficiency of queries for the entire set simultaneously. However the similarity searching is often limited to a specific region of the data set. In this context, this dissertation proposes a new access method able to index metric data efficiently, especially for sets containing clusters. It also proposes a new algorithm for clustering metric data so that selection of a medoid from a particular subset of elements becomes more efficient. The experimental results showed that the proposed algorithms FAMES and M-FAMES can be used as a clustering technique for complex data that outperform PAM, CLARA and CLARANS in effectiveness and efficiency. Moreover, the similarity searching performed with the proposed metric access method FAMESMAM proved to be especially appropriate to data sets with clusters
APA, Harvard, Vancouver, ISO, and other styles
24

Nagappan, Rajehndra Yegappan. "Mining multidimensional data through compositional visualisation." Phd thesis, 2001. http://hdl.handle.net/1885/146042.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Bjering, Heidi. "A framework for temporal abstractive multidimensional data mining." Thesis, 2008. http://handle.uws.edu.au:8081/1959.7/487616.

Full text
Abstract:
In the industrialised world, premature birth has been recognised as one of the most significant perinatal health issues (Kramer, Platt et al. 1998). In Australia 8.1% of babies are born before 37 weeks gestation (Laws, Abeywardana et al. 2007). Premature babies often have prolonged stays in Neonatal Intensive Care Units (NICUs) and can suffer from a number of different conditions during their stay. Some of these conditions have been shown to exhibit certain variations in their physiological parameters that can indicate the onset of such conditions, before it can be detected by other means. Medical monitoring equipment produces large masses of data, which makes analysing this data manually impossible. Adding to the complexity of the large datasets is the nature of physiological monitoring data – the data is multidimensional, where it is not only changes in individual dimensions that are significant, but sometimes simultaneous changes in several dimensions. As the time-series produced by the monitoring equipment is temporal, there is a need for clinical research frameworks that enables both the dimensionality and temporal behaviour to be preserved during data mining. The aim of this research is to extend previous research that proposed a framework to support analysis and trend detection in historical data from Neonatal Intensive Care Unit (NICU) patients. The extensions contribute to fundamental data mining framework research through the integration of temporal abstraction and support of null hypothesis testing within the data mining processes. The application of this new data mining approach is the analysis of level shifts and trends in historical temporal data and to cross correlate data mining findings across multiple data streams for multiple neonatal intensive care patients in an attempt to discover new hypotheses indicative of the onset of some condition. These hypotheses can then be evaluated and defined as rules to be applied in the monitoring of neonates in real-time to enable early detection of possible onset of conditions. This can assist in faster decision making which in turn may avoid conditions developing into serious problems where treatment may be futile. This research employs a constructive research method. In this research, the problem is the inability of current data mining frameworks to completely support clinical research in multidimensional temporal data. This research has resulted in the design of a temporal abstraction multidimensional data mining (TAMDDM) framework suitable for clinical research in multidimensional temporal time series data. The framework is demonstrated through a case study with neonatal intensive care monitoring data.
APA, Harvard, Vancouver, ISO, and other styles
26

Wu, Chin-Ang, and 吳錦昂. "Towards Intelligent Data Warehouse Mining with Ontology-An Example for Multidimensional Association Rule Mining." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/85562411349734227075.

Full text
Abstract:
博士
義守大學
資訊工程學系博士班
98
Multidimensional association mining makes data mining robust because it provides more specific conditional settings of target data for mining. Thus managers can obtain useful knowledge that is close to their need. Data warehousing plays a key role in decision support system by providing knowledge discovery and data mining systems with cleaned and integrated data. In reality, a data warehouse mining system has provided many applicable solutions in industries, such as DBMiner by J. Han’s research group. Yet there still exist problems that cause users extra efforts to discover knowledge, fail to get the real useful knowledge they want or possess the knowledge that are out of date due to the change of a data warehouse or modifications of business rules. If intelligent assistance can be offered in mining system, user can perform mining process more effectively and efficiently and the discovered knowledge can be renewed as necessary. We have observed the following insufficiency that hinders the intelligent assistance: 1) lacking semantic portrayal of data, 2) lacking facilities in catching the users’ mining intentions, 3) lacking capabilities of actively renewing the discovered knowledge. Therefore we present in this dissertation an intelligent data warehouse mining system that incorporates with schema ontology, schema constraint ontology, domain ontology and user preference ontology. This system framework, with the support of the knowledge from these ontologies, mainly provides intelligent assistance in the formulation of effective queries and offering active mining mechanism. The structures of these ontologies are illustrated and how they benefit regular mining process and active mining are also demonstrated by examples of multidimensional association rule mining. A prototype of the proposed system is provided which enables the multidimensional association mining and gives preliminary functions of the intelligent support by ontologies. The experiments to verify the effectiveness of semantic checking and query recommendation provided by the system framework are conducted and the effectiveness of surrogate query are also tested. The results of the experiments show that our proposed intelligent data warehouse mining system that incorporates with ontologies can be beneficial for users, especially inexperienced users, to improve their mining efficiency. We conclude that the system framework we have proposed is allowed to bring users with the following advantages: (1) help users clarify their mining intensions, (2) find concept extended rules from existing primitive data warehouse, (3) allow mining constraints to be set more precisely, (4) allow active re-mining of new updated knowledge based on the user preference ontology, (5) offer automatic dispatch of specific re-mining results to specific users according to their preference.
APA, Harvard, Vancouver, ISO, and other styles
27

Yang, Yi-Bin, and 楊翊彬. "Efficient Workload for Multidimensional and Multilevel Association Rule Mining on Data Cubes." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/b26zfj.

Full text
Abstract:
碩士
國立中央大學
資訊工程研究所
96
Association rule mining plays an important role in decision support systems, it finds interesting rules from a huge amount of historical data. In the past when decision support systems used transactional databases as backends, researches focus on the performance improvement for mining association rules. Nowadays, decision support systems often comes with several frontends and a data warehouse as the backend; the frontends send preprocessed user queries and then fetch the requested data from the warehouse while the central data warehouse has to respond a series request from different users, answering historical data in multiple dimensions and levels. Efficiently answer mining queries on different dimensions and different levels of abstraction is an important issue for decision support systems. Based on some observations, we see that an analysis process includes a series of related queries and many mining queries share common computation results. We proposed an association rule mining system framework which processes queries as a workload, managing and optimizing materialized tables, reusing the result among queries to complete the entire workload efficiently.
APA, Harvard, Vancouver, ISO, and other styles
28

"A new approach to circular unidimensional scaling." 2002. http://library.cuhk.edu.hk/record=b5891287.

Full text
Abstract:
Li Chi Yin.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.
Includes bibliographical references (leaves 78-80).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Multidimensional Scaling (MDS) --- p.1
Chapter 1.2 --- Unidimensional Scaling (UDS) --- p.15
Chapter 1.3 --- Circular Unidimensional Scaling (CDS) --- p.17
Chapter 1.4 --- The Goodness of fit of models --- p.24
Chapter 1.5 --- The admissible transformations of the MDS configuration --- p.26
Chapter 2 --- "Computational Methods on MDS, UDS and CDS" --- p.29
Chapter 2.1 --- Classical Scaling --- p.29
Chapter 2.2 --- Guttman's updating algorithm and Pliner's smoothing algorithm --- p.36
Chapter 2.3 --- Circular Unidimensional Scaling/Circumplex Model --- p.43
Chapter 3 --- A new algorithm for CDS --- p.45
Chapter 3.1 --- Method of choosing a good starting value in Guttman's updating algorithm and Pliner's smoothing algorithm --- p.46
Chapter 3.2 --- A new approach for circular unidimensional scaling --- p.54
Chapter 3.3 --- Examples --- p.62
Chapter 3.3.1 --- Comparison of the new approach to existing method --- p.62
Chapter 3.3.2 --- Illustrations of application to political data --- p.64
Chapter 4 --- Conclusion and Extensions --- p.67
Chapter A --- Figures and Tables --- p.70
Chapter B --- References --- p.78
APA, Harvard, Vancouver, ISO, and other styles
29

"Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset." KUTZTOWN UNIVERSITY OF PENNSYLVANIA, 2009. http://pqdtopen.proquest.com/#viewpdf?dispub=1462396.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Cierco, Agliberto Alves. "Uso de sistemas multidimensionais e algoritmos de data mining para implantação do método Time Driven Activity Based Costing (TDABC) em organizações orientadas por projectos." Doctoral thesis, 2015. http://hdl.handle.net/10071/13456.

Full text
Abstract:
JEL: M150 e M410
Quando o método ABC (Activity Based Costing) foi apresentado para o rateio de custos de actividades de processos gerenciais, representou uma profunda modificação em relação aos métodos anteriormente utilizados. Logo ficaram patentes as enormes vantagens que trazia assim como os desafios em implementálo. O método TDABC (Time Driven Activity Based Costing) surgiu devido justamente às dificuldades operacionais do uso do ABC. Ao invés do uso de estimativas, normalmente dadas pelo corpo de funcionários da empresa, do percentual de tempo gasto em cada actividade, o TDABC propõe duas fundamentais mudanças em relação ao seu predecessor. A primeira é que se considera um tempo de inactividade em relação ao total de horas potencialmente trabalhadas (idle time). A segunda é que será calculado o tempo gasto por hora de trabalho. Nesse caso, o gasto em cada actividade será conduzido multiplicando-se esse valor por hora pelo total de horas requerido por ela. O método TDABC gera um resultado fundamental na hora que for implantado em uma empresa. São as chamadas equações de tempo para cada actividade. Nessas equações, é calculado o tempo gasto em cada actividade diante de diferentes níveis de complexidade na execução dessa. Todo esse trabalho só é possível diante da existência de sistemas de gestão integrada ERP (Enterprise Resource Planning) que registram cada acção na empresa. Nessa tese de doutoramento há duas propostas relativas a implantação do TDABC em empresas: A primeira é que o acompanhamento dos tempos de actividades seja feito por um sistema de ERP associado a um sistema de Business Intelligence (BI) ao invés de um sistema simples de ERP. A segunda proposta é decorrente da primeira. Sugere-se o uso de algoritmos de data mining (principalmente os algoritmos de árvore de indução e de análise de conglomerados), presentes nos sistemas de BI, para a detecção de níveis de complexidade nas equações de tempo. Como razão para a primeira proposta mostramos que sistemas de ERP jamais foram planejados para a detecção de padrões entre os dados neles armazenados. Portanto, sozinhos, eles não poderiam detectar os níveis de complexidade existentes na execução de uma mesma actividade. Para a segunda proposta mostramos que em organizações orientadas por projectos, ou que tenham departamentos que elaborem projectos e possam ser considerados como análogos a estas, a escala do número de actividades e seus dados gerados é tão ampla que gera a necessidade de um sistema automático de detecção de níveis de complexidade nessas actividades. A construção desses objectivos nessa tese segue a seguinte ordem: Primeiro é elaborada uma revisão do método ABC e as razões que levaram ao modelo subsequente TDABC. Em seguida apresenta-se também os conceitos de gerenciamento de projectos e Business Intelligence, notadamente a arquitectura multidimensional de dados e os algoritmos de data mining, introduzindo-se a maneira com que BI possibilita a diferenciação em níveis de complexidade nas equações de tempo. Para tanto faz-se uma introdução à linguagem MDX (Multidimensional Expression) de construção de relatórios em BI. Também se mostra, através de uma introdução aos sistemas de ERP, que esse tipo de sistema sozinho não viabilizaria esse tipo de resultado. Como forma de ilustrar todos esses conceitos é relatada a experiência de colecta de dados de actividades em projectos desenvolvidos em três organizações e a aplicação de BI para a geração das equações de tempo sobre esses dados.
ABC (Activity Based Cost) method was introduced in order to organize the way costs should be partitioned among enterprise management activities, and caused a deep change in the way this division used to be made. Soon it became quite clear the huge advantages of employing such method and the challenges associated with it. The TDABC method (Time Driven Activity Based Cost) was designed to overcome the operational difficulties in using ABC. Rather than employing estimates provided by the company employees, concerning the time spent on each management activity, TDABC suggest two pivotal changes in comparison with its predecessor. First, TDABC considers an idle time regarding the potential total time available for work. Second, TDABC calculates the cost spent per work hour. Therefore, the overall activity cost is reached by simple multiplication of this cost per hour by the number of work hours required by the activity. TDABC produces a fundamental output when it is employed in a company. It is the set of time equations for the management activities. Through these equations, it is possible to calculate the time spent in each activity considering also their different levels of complexities. This result is possible only due to ERP (Enterprise Resource Planning) systems that record every action being performed within the company. In this thesis, it is suggested two main initiatives concerning the usage of TDABC in enterprises. The first one is to employ a Business Intelligence (BI) system associated with an ERP system in order to track the time spent on the management activities. The second initiative is a consequence of the first. It is suggested the usage of Data Mining algorithms (mainly the algorithms for cluster analysis), available in BI suites, for the detection of the complexities levels within the time equations. As justification for the first initiative, it is shown that ERP systems were never designed to detect patterns within their databases. Therefore, without a BI module, it would be quite cumbersome for an ERP system to detect complexity levels in executing a management activity. For the second initiative, it is shown that an average enterprise produces a large-scale number of management activities, and tracking these activities generates a huge amount of data. The volume of information makes impossible to realize the levels of complexities inside the time equations without an automatic procedure to support it. The first part of this work is oriented to introduce a revision of the ABC and TDABC methods. Later, it is introduced the concepts of projects and project management. It is also presented some concepts about Business Intelligence systems and the multidimensional data architecture. The work also introduces the data mining algorithms that make available the detection of the complexity levels in management activities. It is also introduced the MDX( Multidimensional Expression ) language for building reports in BI systems as way to generate the proper sets of data for such detection. It is then reinforced the difficulties to perform this type of analysis in pure ERP systems. In order to illustrate these results it is reported a case study performed in three project management companies and the BI generation of time equations.
APA, Harvard, Vancouver, ISO, and other styles
31

Carvalho, Mariana Reimão Queiroga Valério de. "Enhancing the process of view selection in data cubes using what-If analysis." Doctoral thesis, 2019. http://hdl.handle.net/1822/66886.

Full text
Abstract:
Tese de Doutoramento em Informática
To compete in today's society, enterprise managers need to be able to deal with the arising challenges of the competitive market. The increasing competition and the amount of electronic information imply new challenges related to decision-making processes. Collecting relevant information and using Business Intelligence tools are determining factors in decision-making processes and in gaining competitive advantage. However, gathering and storing relevant information may not be enough. The possibility of simulating business hypothetical scenarios could be the advantage that companies need. What-If analysis can help to achieve this competitive advantage. What-If analysis allows to create simulation models to explore the behavior of a system, by analyzing the effects of changing values of parameters, which cannot otherwise be discovered by a manual analysis of historical data, and so, allowing the analysis of the consequences of those changes. A successful What-If analysis process depends mainly on the user experience, his/her knowledge about the business information and the What-If analysis process itself. Otherwise, it can turn into a long and difficult process, especially in the choice of input parameters for the analysis. In this doctoral thesis, a hybridization methodology is proposed that integrates OLAP preferences in the conventional process of What-If analysis. This integration aims to discover the best recommendations for the choice of input parameters for the analysis scenarios using OLAP preferences, helping the user to overcome the difficulties that normally arise in conventional What- If analysis process. The developed methodology helps to discover more specific, oriented and detailed information that could not be discovered using the conventional What-If analysis process.
Para competir na sociedade atual é necessário que os responsáveis de negócio consigam lidar com os desafios que o mercado lhes coloca no seu quotidiano. A elevada competição e o aumento na quantidade de informação eletrónica envolvida nestes processos implicam novos desafios relacionados com aquilo que designamos por processos de tomada de decisão. A recolha de informação relevante e o uso de ferramentas de Business Intelligence são dois fatores determinantes nos processos de tomada de decisão, e consequentemente na aquisição de vantagem competitiva das empresas. Apesar disto, recolher e armazenar informação relevante pode não ser suficiente. O processo de simular cenários hipotéticos de um negócio pode ser a vantagem que as empresas necessitam para sobreviver no mercado. As técnicas de análise What- If podem ajudar nesta vertente. O processo de análise What-If permite aos utilizadores criarem modelos de simulação para explorarem o comportamento de um dado sistema, analisando os efeitos causados pela alteração de um dado conjunto de variáveis que, usualmente, não podem ser descobertas através de um processo manual de análise de um qualquer conjunto de dados históricos, permitindo, assim, analisar as consequências dessas mesmas alterações. O sucesso de um processo de análise What-If depende crucialmente da experiência do utilizador, do seu conhecimento relativo à informação disponível e, obviamente do próprio processo What-If. Na ausência destes, podemos ter que encarar um processo de análise longo e difícil, especialmente na escolha dos parâmetros de entrada da análise. Nesta tese de doutoramento, é proposta uma metodologia híbrida, que integra preferências OLAP no processo convencional de análise What-If. Esta integração visa descobrir as melhores recomendações para a escolha dos parâmetros de entrada dos vários cenários de análise que considerem um conjunto de preferências OLAP, com vista a ajudar o utilizador a ultrapassar algumas das dificuldades que normalmente surgem durante um processo de análise What-If convencional. A metodologia desenvolvida ajuda a descobrir informações mais específicas, orientadas e detalhadas, que não poderiam ser descobertas usando o processo de análise What-If convencional.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography