Dissertations / Theses on the topic 'Multidimensional data mining'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 31 dissertations / theses for your research on the topic 'Multidimensional data mining.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Torre, Fabrizio. "3D data visualization techniques and applications for visual multidimensional data mining." Doctoral thesis, Universita degli studi di Salerno, 2014. http://hdl.handle.net/10556/1561.
Full textDespite modern technology provide new tools to measure the world around us, we are quickly generating massive amounts of high-dimensional, spatialtemporal data. In this work, I deal with two types of datasets: one in which the spatial characteristics are relatively dynamic and the data are sampled at different periods of time, and the other where many dimensions prevail, although the spatial characteristics are relatively static. The first dataset refers to a peculiar aspect of uncertainty arising from contractual relationships that regulate a project execution: the dispute management. In recent years there has been a growth in size and complexity of the projects managed by public or private organizations. This leads to increased probability of project failures, frequently due to the difficulty and the ability to achieve the objectives such as on-time delivery, cost containment, expected quality achievement. In particular, one of the most common causes of project failure is the very high degree of uncertainty that affects the expected performance of the project, especially when different stakeholders with divergent aims and goals are involved in the project...[edited by author]
XII n.s.
Nimmagadda, Shastri Lakshman. "Ontology based data warehousing for mining of heterogeneous and multidimensional data sources." Thesis, Curtin University, 2015. http://hdl.handle.net/20.500.11937/2322.
Full textWu, Hao-cun, and 吳浩存. "A multidimensional data model for monitoring web usage and optimizing website topology." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B29528215.
Full textPeterson, Angela R. "Visual data mining Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset /." Instructions for remote access, 2009. http://www.kutztown.edu/library/services/remote_access.asp.
Full textDing, Guoxiang. "DERIVING ACTIVITY PATTERNS FROM INDIVIDUAL TRAVEL DIARY DATA: A SPATIOTEMPORAL DATA MINING APPROACH." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1236777859.
Full textLi, Hsin-Fang. "DATA MINING AND PATTERN DISCOVERY USING EXPLORATORY AND VISUALIZATION METHODS FOR LARGE MULTIDIMENSIONAL DATASETS." UKnowledge, 2013. http://uknowledge.uky.edu/epb_etds/4.
Full textKucuktunc, Onur. "Result Diversification on Spatial, Multidimensional, Opinion, and Bibliographic Data." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374148621.
Full textFoltýnová, Veronika. "Multidimenzionální analýza dat a zpracování analytického zobrazení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-376922.
Full textNunes, Santiago Augusto. "Análise espaço-temporal de data streams multidimensionais." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-17102016-152137/.
Full textData streams are usually characterized by large amounts of data generated continuously in synchronous or asynchronous potentially infinite processes, in applications such as: meteorological systems, industrial processes, vehicle traffic, financial transactions, sensor networks, among others. In addition, the behavior of the data tends to change significantly over time, defining evolutionary data streams. These changes may mean temporary events (such as anomalies or extreme events) or relevant changes in the process of generating the stream (that result in changes in the distribution of the data). Furthermore, these data sets can have spatial characteristics such as geographic location of sensors, which can be useful in the analysis process. The detection of these behavioral changes considering aspects of evolution, as well as the spatial characteristics of the data, is relevant for some types of applications, such as monitoring of extreme weather events in Agrometeorology researches. In this context, this project proposes a technique to help spatio-temporal analysis in multidimensional data streams containing spatial and non-spatial information. The adopted approach is based on concepts of the Fractal Theory, used for temporal behavior analysis, as well as techniques for data streams handling also hierarchical data structures, allowing analysis tasks that take into account the spatial and non-spatial aspects simultaneously. The developed technique has been applied to agro-meteorological data to identify different behaviors considering different sub-regions defined by the spatial characteristics of the data. Therefore, results from this work include contribution to data mining area and support research in Agrometeorology.
Nieto, Erick Mauricio Gómez. "Projeção multidimensional aplicada a visualização de resultados de busca textual." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05122012-105730/.
Full textInternet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document (or web page) and a link to it. This display has many advantages, e.g., it affords easy navigation and is straightforward to interpret. Nonetheless, any user of search engines could possibly report some experience of disappointment with this metaphor. Indeed, it has limitations in particular situations, as it fails to provide an overview of the document collection retrieved. Moreover, depending on the nature of the query - e.g., it may be too general, or ambiguous, or ill expressed - the desired information may be poorly ranked, or results may contemplate varied topics. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content-wise. We propose a visualization technique to display the results of web queries aimed at overcoming such limitations. It combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation by employing a multidimensional projection to derive two-dimensional layouts of the query search results that preserve text similarity relations, or neighborhoods. Similarity is computed by applying the cosine similarity over a bag-of-words vector representation of collection built from the snippets. If the snippets are displayed directly according to the derived layout they will overlap considerably, producing a poor visualization. We overcome this problem by defining an energy functional that considers both the overlapping amongst snippets and the preservation of the neighborhood structure as given in vii the projected layout. Minimizing this energy functional provides a neighborhood preserving two-dimensional arrangement of the textual snippets with minimum overlap. The resulting visualization conveys both a global view of the query results and visual groupings that reflect related results, as illustrated in several examples shown
Ivaškevičius, Klaidas. "Daugiamačių sekų šablonų analizė." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2012~D_20140630_173416-93518.
Full textThe main goal of this master final work was to present some of the algorithms and their combinations for the multidimensional sequence pattern mining and implement an algorithm, that is capable of doing that. FP-Tree, that is used to store critical (for example, often repeated) data, was described. FP-Growth algorithm, that can analyze FP-Tree structure and give frequent pattern set as a result, was presented. MD-PS-FPG algorithm – a combination of modified FP-Growth and PrefixSpan algorithms – was introduced. The results of some tests, further work objectives and other things were also presented.
Nogueira, Rodrigo Ramos. "Newsminer: um sistema de data warehouse baseado em texto de notícias." Universidade Federal de São Carlos, 2017. https://repositorio.ufscar.br/handle/ufscar/9138.
Full textApproved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:04Z (GMT) No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5)
Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:13Z (GMT) No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5)
Made available in DSpace on 2017-10-09T14:14:24Z (GMT). No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) Previous issue date: 2017-05-12
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Data and text mining applications managing Web data have been the subject of recent research. In every case, data mining tasks need to work on clean, consistent, and integrated data for obtaining the best results. Thus, Data Warehouse environments are a valuable source of clean, integrated data for data mining applications. Data Warehouse technology has evolved to retrieve and process data from the Web. In particular, news websites are rich sources that can compose a linguistic corpus. By inserting corpus into a Data Warehousing environment, applications can take advantage of the flexibility that a multidimensional model and OLAP operations provide. Among the benefits are the navigation through the data, the selection of the part of the data considered relevant, data analysis at different levels of abstraction, and aggregation, disaggregation, rotation and filtering over any set of data. This paper presents Newsminer, a data warehouse environment, which provides a consistent and clean set of texts in the form of a multidimensional corpus for consumption by external applications and users. The proposal includes an architecture that integrates the gathering of news in real time, a semantic enrichment module as part of the ETL stage, which adds semantic properties to the data such as news category and POS-tagging annotation and the access to data cubes for consumption by applications and users. Two experiments were performed. The first experiment selects the best news classifier for the semantic enrichment module. The statistical analysis of the results indicated that the Perceptron classifier achieved the best results of F-measure, with a good result of computational time. The second experiment collected data to evaluate real-time news preprocessing. For the data set collected, the results indicated that it is possible to achieve online processing time.
As aplicações de mineração de dados e textos oriundos da Internet têm sido alvo de recentes pesquisas. E, em todos os casos, as tarefas de mineração de dados necessitam trabalhar sobre dados limpos, consistentes e integrados para obter os melhores resultados. Sendo assim, ambientes de Data Warehouse são uma valiosa fonte de dados limpos e integrados para as aplicações de mineração. A tecnologia de Data Warehouse tem evoluído no sentido de recuperar e tratar dados provenientes da Web. Em particular, os sites de notícias são fontes ricas em textos, que podem compor um corpus linguístico. Inserindo o corpus em um ambiente de Data Warehouse, as aplicações poderão tirar proveito da flexibilidade que um modelo multidimensional e as operações OLAP fornecem. Dentre as vantagens estão a navegação pelos dados, a seleção da parte dos dados considerados relevantes, a análise dos dados em diferentes níveis de abstração, e a agregação, desagregação, rotação e filtragem sobre qualquer conjunto de dados. Este trabalho apresenta o ambiente de Data Warehouse Newsminer, que fornece um conjunto de textos consistente e limpo, na forma de um corpus multidimensional para consumo por aplicações externas e usuários. A proposta inclui uma arquitetura que integra a coleta textos de notícias em tempo próximo do tempo real, um módulo de enriquecimento semântico como parte da etapa de ETL, que acrescenta propriedades semânticas aos dados coletados tais como a categoria da notícia e a anotação POS-tagging, e a disponibilização de cubos de dados para consumo por aplicações e usuários. Foram executados dois experimentos. O primeiro experimento é relacionado à escolha do melhor classificador de categorias das notícias do módulo de enriquecimento semântico. A análise estatística dos resultados indicou que o classificador Perceptron atingiu os melhores resultados de F-medida, com resultado bom de tempo de processamento. O segundo experimento coletou dados para avaliar o pré-processamento de notícias em tempo real. Para o conjunto de dados coletados, os resultados indicaram que é possível atingir tempo de processamento online.
OB800972
Egho, Elias. "Extraction de motifs séquentiels dans des données séquentielles multidimensionnelles et hétérogènes : une application à l'analyse de trajectoires de patients." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0066/document.
Full textAll domains of science and technology produce large and heterogeneous data. Although a lot of work was done in this area, mining such data is still a challenge. No previous research work targets the mining of heterogeneous multidimensional sequential data. This thesis proposes a contribution to knowledge discovery in heterogeneous sequential data. We study three different research directions: (i) Extraction of sequential patterns, (ii) Classification and (iii) Clustering of sequential data. Firstly we generalize the notion of a multidimensional sequence by considering complex and heterogeneous sequential structure. We present a new approach called MMISP to extract sequential patterns from heterogeneous sequential data. MMISP generates a large number of sequential patterns as this is usually the case for pattern enumeration algorithms. To overcome this problem, we propose a novel way of considering heterogeneous multidimensional sequences by mapping them into pattern structures. We develop a framework for enumerating only patterns satisfying given constraints. The second research direction is in concern with the classification of heterogeneous multidimensional sequences. We use Formal Concept Analysis (FCA) as a classification method. We show interesting properties of concept lattices and of stability index to classify sequences into a concept lattice and to select some interesting groups of sequences. The third research direction in this thesis is in concern with the clustering of heterogeneous multidimensional sequential data. We focus on the notion of common subsequences to define similarity between a pair of sequences composed of a list of itemsets. We use this similarity measure to build a similarity matrix between sequences and to separate them in different groups. In this work, we present theoretical results and an efficient dynamic programming algorithm to count the number of common subsequences between two sequences without enumerating all subsequences. The system resulting from this research work was applied to analyze and mine patient healthcare trajectories in oncology. Data are taken from a medico-administrative database including all information about the hospitalizations of patients in Lorraine Region (France). The system allows to identify and characterize episodes of care for specific sets of patients. Results were discussed and validated with domain experts
Paulovich, Fernando Vieira. "Mapeamento de dados multi-dimensionais - integrando mineração e visualização." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-04032009-145018/.
Full textProjection or point placement techniques, useful for mapping multidimensional data into visual spaces, have always risen interest in the visualization and data analysis communities because they can support data exploration based on similarity or correlation relations. Regardless of that interest, various problems arise when dealing with such techniques, impairing their widespread application. In particularly the projections that yield highest quality layouts have prohibitive computational cost for large data sets. Additionally, there are issues regarding visual scalability, i.e., the capability of visually fit the individual points in the exploration space as the data set grows large. This thesis treats the problems of projections from various perspectives, presenting novel techniques that solve, to certain extent, several of the verified problems. It is also a fact that size and complexity of data sets suggest the integration of data mining capabilities into the visualization pipeline, both during the mapping process and as a tools to extract additional information after the data have been layed out. This thesis also add some aspects of mining to the multidimensional visualization process, mainly for the particular application of analysis of document collections, proposing and implementing an approach for topic extraction. As supporting tools for testing these techniques and comparing them to existing ones different software systems were written. The main one includes the techniques developed here as well as several of the classical projection and dimensional reduction techniques, and can be used for exploring various kinds of data sets, with addition functionality to support the mapping of document collections. This thesis contributes to the understanding of the projection or mapping problem and develops new techniques that are fast, treat adequately the visual formation of groups of highly related data items, separate those groups properly and allow exploration of data in various levels of detail
Salazar, Frizzi Alejandra San Roman. "Um estudo sobre o papel de medidas de similaridade em visualização de coleções de documentos." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012013-155903/.
Full textInformation visualization techniques, such as similarity based point placement, are used for generating of visual data representation that evidence some patterns. These techniques are sensitive to data quality, which depends of a very influential preprocessing step. This step involves cleaning the text and in some cases, detecting terms and their weights, as well as definiting a (dis)similarity function. There are few studies on how these (dis)similarity calculations aect the quality of visual representations for textual data. This work presents a study on the role of the various (dis)similarity measures in generating visual maps. We focus primarily on two types of distance functions, those based on vector representations of the text (Vector Space Model (VSM)) and measures obtained from direct comparison of text strings, comparing the effect on the visual maps obtained with point placement techniques with the two approaches. For this, objective measures were employed to compare the visual quality of the generated maps, such as the Neighborhood Hit and Silhouette Coefficient. We found that both approaches have strengths, but in general, the VSM showed better results as far as class discrimination is concerned. However, the conventional VSM is not incremental, i.e., new additions to the collection force the recalculation of the data space and dissimilarities previously computed. Thus, a new model based on incremental VSM (Incremental Vector Space Model (iVSM)) has been also considered in our comparative studies. iVSM showed the best quantitative and qualitative results in several of the configurations considered. The evaluation results are presented and recommendations on the application of different similarity measures for text analysis tasks visually are provided
Hlavička, Ladislav. "Dolování asociačních pravidel z datových skladů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-235501.
Full textHader, Sören. "Data mining auf multidimensionalen und komplexen Daten in der industriellen Bildverarbeitung." Berlin Pro Business, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2984734&prov=M&dok_var=1&dok_ext=htm.
Full textHader, Sören. "Data mining auf multidimensionalen und komplexen Daten in der industriellen Bildverarbeitung /." Berlin : Pro Business, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=2984734&prov=M&dok_var=1&dok_ext=htm.
Full textBandini, Lorenzo. "Progettazione di un sistema di outlier detection per cubi multidimensionali." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/12931/.
Full textPumprla, Ondřej. "Získávání znalostí z datových skladů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236715.
Full textVidela, Cavieres Iván Fernando. "Improvement of recommendation system for a wholesale store chain using advanced data mining techniques." Tesis, Universidad de Chile, 2015. http://repositorio.uchile.cl/handle/2250/133522.
Full textIngeniero Civil Industrial
En las empresas de Retail, las áreas de Customer Intelligence tienen muchas oportunidades de mejorar sus decisiones estratégicas a partir de la información que podrían obtener de los registros de interacciones con sus clientes. Sin embargo se ha convertido en un desafío poder procesar estos grandes volúmenes de datos. Uno de los problemas que se enfrentan día a día es segmentar o agrupar clientes. La mayoría de las empresas generan agrupaciones según nivel de gasto, no por similitud en sus canastas de compra, como propone la literatura. Otro desafío de estas empresas es aumentar las ventas en cada visita del cliente y fidelizar. Una de las técnicas utilizadas para lograrlo es usar sistemas de recomendación. En este trabajo se proceso ́ alrededor de medio billón de registros transaccionales de una cadena de supermercados mayorista. Al aplicar las técnicas tradicionales de Clustering y Market Basket Analysis los resultados son de baja calidad, haciendo muy difícil la interpretación, además no se logra identificar grupos que permitan clasificar a un cliente de acuerdo a sus compras históricas. Entendiendo que la presencia simultánea de dos productos en una misma boleta implica una relación entre ellos, se usó un método de graph mining basado en redes sociales que permitió obtener grupos de productos identificables que denominamos comunidades, a las que puede pertenecer un cliente. La robustez del modelo se comprueba por la estabilidad de los grupos generados en distintos periodos de tiempo. Bajo las mismas restricciones que la empresa exige, se generan recomendaciones basadas en las compras históricas y en la pertenencia de los clientes a los distintos grupos de productos. De esta manera, los clientes reciben recomendaciones mucho más pertinentes y no solo son basadas en los que otros clientes también compraron. La novedosa forma de resolver el problema de segmentar clientes ayuda a mejorar en un 140% el actual método de recomendaciones que utiliza la cadena Chilena de supermercados mayoristas. Esto se traduce en un aumento de más de 430% de los ingresos posibles.
Andery, Gabriel de Faria. "Integrando projeções multidimensionais à analise visual de redes sociais." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-06102010-111345/.
Full textFor decades, social sciences researchers have searched for graphical forms to express human social relationships. The development of computer science and more recently of the Internet has given rise to a new field of research for visualization and social sciences professionals, that of social network visualization. This field can potentially offer new opportunities in reveal new patterns that can benefit a large number of applications and individuals in fields such as commerce, security, knowledge networks and marketing. A large part of social network visualization algorithms and systems relies on graph representations, highlighting relationships amongst individuals and groups of individuals, but mostly neglecting the other available attributes of individuals. Thus, this work presents a set of tools to represent and explore social networks visually, taking into consideration the attributes of the nodes. The first technique employs heterogeneous networks, where both individuals and communities are represented in the graph; the second solution uses visualization techniques based on multidimensional projection, which promote the placement of data in the plane according to some similarity criterion based on attribute; still another proposed technique coordinates multiple views in order to speed up focus in regions of interest in the data sets. The results indicate that the solutions promote high degree of representation power and that concept identification not easily obtained via other methods is possible; the evidence comes from case studies as well as a user evaluation. This work includes a study in the area of graph visualization for social network analysis as well as a system implementing the proposed solutions, that integrate network visualization and multidimensional projections to extract patterns from social networks
Paterlini, Adriano Arantes. "Imersão de espaços métricos em espaços multidimensionais para indexação de dados usando detecção de agrupamentos." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-25042011-155810/.
Full textThe success of Database Management System (DBMS) for applications with traditional data (numbers and short texts) has encouraged its use in new types of applications that require manipulation of complex data. Time series, scientific data and other multimedia data are examples of complex data. Several application fields, like medical informatics, have demanded solutions for managing complex data. Complex data can also be studied by means of Knowledge Discovery Techniques (KDD) applying appropriate clustering algorithms. However, these algorithms have high computational cost hindering their use in large data sets. The techniques already developed in the Databases research field for indexing metric spaces usually consider the sets have a uniform distribution, without taking into account the existence of clusters in the data, therefore the structures need to generalize the efficiency of queries for the entire set simultaneously. However the similarity searching is often limited to a specific region of the data set. In this context, this dissertation proposes a new access method able to index metric data efficiently, especially for sets containing clusters. It also proposes a new algorithm for clustering metric data so that selection of a medoid from a particular subset of elements becomes more efficient. The experimental results showed that the proposed algorithms FAMES and M-FAMES can be used as a clustering technique for complex data that outperform PAM, CLARA and CLARANS in effectiveness and efficiency. Moreover, the similarity searching performed with the proposed metric access method FAMESMAM proved to be especially appropriate to data sets with clusters
Nagappan, Rajehndra Yegappan. "Mining multidimensional data through compositional visualisation." Phd thesis, 2001. http://hdl.handle.net/1885/146042.
Full textBjering, Heidi. "A framework for temporal abstractive multidimensional data mining." Thesis, 2008. http://handle.uws.edu.au:8081/1959.7/487616.
Full textWu, Chin-Ang, and 吳錦昂. "Towards Intelligent Data Warehouse Mining with Ontology-An Example for Multidimensional Association Rule Mining." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/85562411349734227075.
Full text義守大學
資訊工程學系博士班
98
Multidimensional association mining makes data mining robust because it provides more specific conditional settings of target data for mining. Thus managers can obtain useful knowledge that is close to their need. Data warehousing plays a key role in decision support system by providing knowledge discovery and data mining systems with cleaned and integrated data. In reality, a data warehouse mining system has provided many applicable solutions in industries, such as DBMiner by J. Han’s research group. Yet there still exist problems that cause users extra efforts to discover knowledge, fail to get the real useful knowledge they want or possess the knowledge that are out of date due to the change of a data warehouse or modifications of business rules. If intelligent assistance can be offered in mining system, user can perform mining process more effectively and efficiently and the discovered knowledge can be renewed as necessary. We have observed the following insufficiency that hinders the intelligent assistance: 1) lacking semantic portrayal of data, 2) lacking facilities in catching the users’ mining intentions, 3) lacking capabilities of actively renewing the discovered knowledge. Therefore we present in this dissertation an intelligent data warehouse mining system that incorporates with schema ontology, schema constraint ontology, domain ontology and user preference ontology. This system framework, with the support of the knowledge from these ontologies, mainly provides intelligent assistance in the formulation of effective queries and offering active mining mechanism. The structures of these ontologies are illustrated and how they benefit regular mining process and active mining are also demonstrated by examples of multidimensional association rule mining. A prototype of the proposed system is provided which enables the multidimensional association mining and gives preliminary functions of the intelligent support by ontologies. The experiments to verify the effectiveness of semantic checking and query recommendation provided by the system framework are conducted and the effectiveness of surrogate query are also tested. The results of the experiments show that our proposed intelligent data warehouse mining system that incorporates with ontologies can be beneficial for users, especially inexperienced users, to improve their mining efficiency. We conclude that the system framework we have proposed is allowed to bring users with the following advantages: (1) help users clarify their mining intensions, (2) find concept extended rules from existing primitive data warehouse, (3) allow mining constraints to be set more precisely, (4) allow active re-mining of new updated knowledge based on the user preference ontology, (5) offer automatic dispatch of specific re-mining results to specific users according to their preference.
Yang, Yi-Bin, and 楊翊彬. "Efficient Workload for Multidimensional and Multilevel Association Rule Mining on Data Cubes." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/b26zfj.
Full text國立中央大學
資訊工程研究所
96
Association rule mining plays an important role in decision support systems, it finds interesting rules from a huge amount of historical data. In the past when decision support systems used transactional databases as backends, researches focus on the performance improvement for mining association rules. Nowadays, decision support systems often comes with several frontends and a data warehouse as the backend; the frontends send preprocessed user queries and then fetch the requested data from the warehouse while the central data warehouse has to respond a series request from different users, answering historical data in multiple dimensions and levels. Efficiently answer mining queries on different dimensions and different levels of abstraction is an important issue for decision support systems. Based on some observations, we see that an analysis process includes a series of related queries and many mining queries share common computation results. We proposed an association rule mining system framework which processes queries as a workload, managing and optimizing materialized tables, reusing the result among queries to complete the entire workload efficiently.
"A new approach to circular unidimensional scaling." 2002. http://library.cuhk.edu.hk/record=b5891287.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2002.
Includes bibliographical references (leaves 78-80).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Multidimensional Scaling (MDS) --- p.1
Chapter 1.2 --- Unidimensional Scaling (UDS) --- p.15
Chapter 1.3 --- Circular Unidimensional Scaling (CDS) --- p.17
Chapter 1.4 --- The Goodness of fit of models --- p.24
Chapter 1.5 --- The admissible transformations of the MDS configuration --- p.26
Chapter 2 --- "Computational Methods on MDS, UDS and CDS" --- p.29
Chapter 2.1 --- Classical Scaling --- p.29
Chapter 2.2 --- Guttman's updating algorithm and Pliner's smoothing algorithm --- p.36
Chapter 2.3 --- Circular Unidimensional Scaling/Circumplex Model --- p.43
Chapter 3 --- A new algorithm for CDS --- p.45
Chapter 3.1 --- Method of choosing a good starting value in Guttman's updating algorithm and Pliner's smoothing algorithm --- p.46
Chapter 3.2 --- A new approach for circular unidimensional scaling --- p.54
Chapter 3.3 --- Examples --- p.62
Chapter 3.3.1 --- Comparison of the new approach to existing method --- p.62
Chapter 3.3.2 --- Illustrations of application to political data --- p.64
Chapter 4 --- Conclusion and Extensions --- p.67
Chapter A --- Figures and Tables --- p.70
Chapter B --- References --- p.78
"Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset." KUTZTOWN UNIVERSITY OF PENNSYLVANIA, 2009. http://pqdtopen.proquest.com/#viewpdf?dispub=1462396.
Full textCierco, Agliberto Alves. "Uso de sistemas multidimensionais e algoritmos de data mining para implantação do método Time Driven Activity Based Costing (TDABC) em organizações orientadas por projectos." Doctoral thesis, 2015. http://hdl.handle.net/10071/13456.
Full textQuando o método ABC (Activity Based Costing) foi apresentado para o rateio de custos de actividades de processos gerenciais, representou uma profunda modificação em relação aos métodos anteriormente utilizados. Logo ficaram patentes as enormes vantagens que trazia assim como os desafios em implementálo. O método TDABC (Time Driven Activity Based Costing) surgiu devido justamente às dificuldades operacionais do uso do ABC. Ao invés do uso de estimativas, normalmente dadas pelo corpo de funcionários da empresa, do percentual de tempo gasto em cada actividade, o TDABC propõe duas fundamentais mudanças em relação ao seu predecessor. A primeira é que se considera um tempo de inactividade em relação ao total de horas potencialmente trabalhadas (idle time). A segunda é que será calculado o tempo gasto por hora de trabalho. Nesse caso, o gasto em cada actividade será conduzido multiplicando-se esse valor por hora pelo total de horas requerido por ela. O método TDABC gera um resultado fundamental na hora que for implantado em uma empresa. São as chamadas equações de tempo para cada actividade. Nessas equações, é calculado o tempo gasto em cada actividade diante de diferentes níveis de complexidade na execução dessa. Todo esse trabalho só é possível diante da existência de sistemas de gestão integrada ERP (Enterprise Resource Planning) que registram cada acção na empresa. Nessa tese de doutoramento há duas propostas relativas a implantação do TDABC em empresas: A primeira é que o acompanhamento dos tempos de actividades seja feito por um sistema de ERP associado a um sistema de Business Intelligence (BI) ao invés de um sistema simples de ERP. A segunda proposta é decorrente da primeira. Sugere-se o uso de algoritmos de data mining (principalmente os algoritmos de árvore de indução e de análise de conglomerados), presentes nos sistemas de BI, para a detecção de níveis de complexidade nas equações de tempo. Como razão para a primeira proposta mostramos que sistemas de ERP jamais foram planejados para a detecção de padrões entre os dados neles armazenados. Portanto, sozinhos, eles não poderiam detectar os níveis de complexidade existentes na execução de uma mesma actividade. Para a segunda proposta mostramos que em organizações orientadas por projectos, ou que tenham departamentos que elaborem projectos e possam ser considerados como análogos a estas, a escala do número de actividades e seus dados gerados é tão ampla que gera a necessidade de um sistema automático de detecção de níveis de complexidade nessas actividades. A construção desses objectivos nessa tese segue a seguinte ordem: Primeiro é elaborada uma revisão do método ABC e as razões que levaram ao modelo subsequente TDABC. Em seguida apresenta-se também os conceitos de gerenciamento de projectos e Business Intelligence, notadamente a arquitectura multidimensional de dados e os algoritmos de data mining, introduzindo-se a maneira com que BI possibilita a diferenciação em níveis de complexidade nas equações de tempo. Para tanto faz-se uma introdução à linguagem MDX (Multidimensional Expression) de construção de relatórios em BI. Também se mostra, através de uma introdução aos sistemas de ERP, que esse tipo de sistema sozinho não viabilizaria esse tipo de resultado. Como forma de ilustrar todos esses conceitos é relatada a experiência de colecta de dados de actividades em projectos desenvolvidos em três organizações e a aplicação de BI para a geração das equações de tempo sobre esses dados.
ABC (Activity Based Cost) method was introduced in order to organize the way costs should be partitioned among enterprise management activities, and caused a deep change in the way this division used to be made. Soon it became quite clear the huge advantages of employing such method and the challenges associated with it. The TDABC method (Time Driven Activity Based Cost) was designed to overcome the operational difficulties in using ABC. Rather than employing estimates provided by the company employees, concerning the time spent on each management activity, TDABC suggest two pivotal changes in comparison with its predecessor. First, TDABC considers an idle time regarding the potential total time available for work. Second, TDABC calculates the cost spent per work hour. Therefore, the overall activity cost is reached by simple multiplication of this cost per hour by the number of work hours required by the activity. TDABC produces a fundamental output when it is employed in a company. It is the set of time equations for the management activities. Through these equations, it is possible to calculate the time spent in each activity considering also their different levels of complexities. This result is possible only due to ERP (Enterprise Resource Planning) systems that record every action being performed within the company. In this thesis, it is suggested two main initiatives concerning the usage of TDABC in enterprises. The first one is to employ a Business Intelligence (BI) system associated with an ERP system in order to track the time spent on the management activities. The second initiative is a consequence of the first. It is suggested the usage of Data Mining algorithms (mainly the algorithms for cluster analysis), available in BI suites, for the detection of the complexities levels within the time equations. As justification for the first initiative, it is shown that ERP systems were never designed to detect patterns within their databases. Therefore, without a BI module, it would be quite cumbersome for an ERP system to detect complexity levels in executing a management activity. For the second initiative, it is shown that an average enterprise produces a large-scale number of management activities, and tracking these activities generates a huge amount of data. The volume of information makes impossible to realize the levels of complexities inside the time equations without an automatic procedure to support it. The first part of this work is oriented to introduce a revision of the ABC and TDABC methods. Later, it is introduced the concepts of projects and project management. It is also presented some concepts about Business Intelligence systems and the multidimensional data architecture. The work also introduces the data mining algorithms that make available the detection of the complexity levels in management activities. It is also introduced the MDX( Multidimensional Expression ) language for building reports in BI systems as way to generate the proper sets of data for such detection. It is then reinforced the difficulties to perform this type of analysis in pure ERP systems. In order to illustrate these results it is reported a case study performed in three project management companies and the BI generation of time equations.
Carvalho, Mariana Reimão Queiroga Valério de. "Enhancing the process of view selection in data cubes using what-If analysis." Doctoral thesis, 2019. http://hdl.handle.net/1822/66886.
Full textTo compete in today's society, enterprise managers need to be able to deal with the arising challenges of the competitive market. The increasing competition and the amount of electronic information imply new challenges related to decision-making processes. Collecting relevant information and using Business Intelligence tools are determining factors in decision-making processes and in gaining competitive advantage. However, gathering and storing relevant information may not be enough. The possibility of simulating business hypothetical scenarios could be the advantage that companies need. What-If analysis can help to achieve this competitive advantage. What-If analysis allows to create simulation models to explore the behavior of a system, by analyzing the effects of changing values of parameters, which cannot otherwise be discovered by a manual analysis of historical data, and so, allowing the analysis of the consequences of those changes. A successful What-If analysis process depends mainly on the user experience, his/her knowledge about the business information and the What-If analysis process itself. Otherwise, it can turn into a long and difficult process, especially in the choice of input parameters for the analysis. In this doctoral thesis, a hybridization methodology is proposed that integrates OLAP preferences in the conventional process of What-If analysis. This integration aims to discover the best recommendations for the choice of input parameters for the analysis scenarios using OLAP preferences, helping the user to overcome the difficulties that normally arise in conventional What- If analysis process. The developed methodology helps to discover more specific, oriented and detailed information that could not be discovered using the conventional What-If analysis process.
Para competir na sociedade atual é necessário que os responsáveis de negócio consigam lidar com os desafios que o mercado lhes coloca no seu quotidiano. A elevada competição e o aumento na quantidade de informação eletrónica envolvida nestes processos implicam novos desafios relacionados com aquilo que designamos por processos de tomada de decisão. A recolha de informação relevante e o uso de ferramentas de Business Intelligence são dois fatores determinantes nos processos de tomada de decisão, e consequentemente na aquisição de vantagem competitiva das empresas. Apesar disto, recolher e armazenar informação relevante pode não ser suficiente. O processo de simular cenários hipotéticos de um negócio pode ser a vantagem que as empresas necessitam para sobreviver no mercado. As técnicas de análise What- If podem ajudar nesta vertente. O processo de análise What-If permite aos utilizadores criarem modelos de simulação para explorarem o comportamento de um dado sistema, analisando os efeitos causados pela alteração de um dado conjunto de variáveis que, usualmente, não podem ser descobertas através de um processo manual de análise de um qualquer conjunto de dados históricos, permitindo, assim, analisar as consequências dessas mesmas alterações. O sucesso de um processo de análise What-If depende crucialmente da experiência do utilizador, do seu conhecimento relativo à informação disponível e, obviamente do próprio processo What-If. Na ausência destes, podemos ter que encarar um processo de análise longo e difícil, especialmente na escolha dos parâmetros de entrada da análise. Nesta tese de doutoramento, é proposta uma metodologia híbrida, que integra preferências OLAP no processo convencional de análise What-If. Esta integração visa descobrir as melhores recomendações para a escolha dos parâmetros de entrada dos vários cenários de análise que considerem um conjunto de preferências OLAP, com vista a ajudar o utilizador a ultrapassar algumas das dificuldades que normalmente surgem durante um processo de análise What-If convencional. A metodologia desenvolvida ajuda a descobrir informações mais específicas, orientadas e detalhadas, que não poderiam ser descobertas usando o processo de análise What-If convencional.