Dissertations / Theses: 'Data mining technologies'

1

Mamčenko, Jelena. "Data mining technologies for distributed servers' efficiency." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2008~D_20090105_150115-82504.

Full text

Abstract:

The main idea is an application of data mining technologies in order to increase distributed servers’ efficiency using data mining methods and agent’s technology. The objects of investigation are data from document based model database and its using by allocatable servers.
Disertacijoje nagrinėjamos šiuolaikiškos duomenų gavybos technologijos serverių našumui gerinti, taikant įvairius duomenų gavybos metodus ir agentines technologijas. Pagrindinis tyrimo objektas – dokumentinių duomenų bazių duomenys ir jų naudojimas išskirstytuose serveriuose.

APA, Harvard, Vancouver, ISO, and other styles

2

Rentzsch, Viola. "Human trafficking 2.0 the impact of new technologies." University of the Western Cape, 2021. http://hdl.handle.net/11394/8353.

Full text

Abstract:

Magister Legum - LLM
Human history is traversed by migration. This manifold global phenomenon has shaped the world to its current state, moving people from one place to another in reaction to the changing world. The autonomous decision to permanently move locations represents only a segment of what is considered to be migration. Routes can be dangerous, reasons can be without any alternative, displacements forced, and journeys deadly. Arguably the most fatal of all long-distance global migration flows, the transatlantic slave trade has left an enduring legacy of economic patterns and persistent pain. Whilst the trade in human beings originated centuries before, with Europe’s long history of slavery, this event represents an atrocious milestone in history. In a nutshell, European colonialists traded slaves for goods from African kings, who had captured them as war prisoners.

APA, Harvard, Vancouver, ISO, and other styles

3

Jiang, Lu. "Advanced imaging and data mining technologies for medical and food safety applications." College Park, Md. : University of Maryland, 2009. http://hdl.handle.net/1903/9862.

Full text

Abstract:

Thesis (Ph.D.) -- University of Maryland, College Park, 2009.
Thesis research directed by: Fischell Dept. of Bioengineering. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

4

Espinoza, Sofia Elizabeth. "Data mining methods applied to healthcare problems." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44903.

Full text

Abstract:

Growing adoption of health information technologies is allowing healthcare providers to capture and store enormous amounts of patient data. In order to effectively use this data to improve healthcare outcomes and processes, clinicians need to identify the relevant measures and apply the correct analysis methods for the type of data at hand. In this dissertation, we present various data mining and statistical methods that could be applied to the type of datasets that are found in healthcare research. We discuss the process of identification of appropriate measures and statistical tools, the analysis and validation of mathematical models, and the interpretation of results to improve healthcare quality and safety. We illustrate the application of statistics and data mining techniques on three real-world healthcare datasets. In the first chapter, we develop a new method to assess hydration status using breath samples. Through analysis of the more than 300 volatile organic compounds contained in human breath, we aim to identify markers of hydration. In the second chapter, we evaluate the impact of the implementation of an electronic medical record system on the rate of inpatient medication errors and adverse drug events. The objective is to understand the impact on patient safety of different information technologies in a specific environment (inpatient pediatrics) and to provide recommendations on how to correctly analyze count data with a large amount of zeros. In the last chapter, we develop a mathematical model to predict the probability of developing post-operative nausea and vomiting based on patient demographics and clinical history, and to identify the group of patients at high-risk.

APA, Harvard, Vancouver, ISO, and other styles

5

Reipas, Artūras. "Verslo analizės metodų taikymas mažų įmonių informacinėse sistemose." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2007. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2007~D_20070115_092850-31887.

Full text

Abstract:

In current work problems and requirements for demand forecasting in commercial or manufacturing enterprises are analyzed and suitable forecasting algorithms are proposed. In enterprises with multidimensional and heterogeneous demand it is advisable to use different algorithms for different demand constituents and to readjust parameters used for forecasting. Existing forecasting packages are not practical as they are not integrated with commodities or materials supply orders management activities and business processes of enterprise. The orders management system is developed with forecasting component using adopted time series forecasting techniques such as moving average, exponential smoothing, double exponential smoothing. These techniques ensure reliable forecasting results for different time series models: random, trend and are integrated with other business management activities. It is possible to calculate deviations of forecasted demand from factual values, to select algorithms giving minimal perсentage error, and to adjust algorithms parameters to changing demand. The system can help managers to choose forecasting algorithms and to adapt their parameters in the course of time. The system is designed using UML CASE tool and implemented in Microsoft .Net environment using MS SQL Server 2005 for data storage.

APA, Harvard, Vancouver, ISO, and other styles

6

Dagnely, Pierre. "Scalable Performance Assessment of Industrial Assets: A Data Mining Approach." Doctoral thesis, Universite Libre de Bruxelles, 2019. https://dipot.ulb.ac.be/dspace/bitstream/2013/288650/5/contratPD.pdf.

Full text

Abstract:

Nowadays, more and more industrial assets are continuously monitored and generate vast amount of event logs and sensor data. Data Mining is the field concerned with the exploration and exploitation of these data. Despite the fact that data mining has been researched for decades, the event log data are still underexploited in most data mining workflows although they could provide valuable insights on the asset behavior as they represent the internal processes of an asset. However, exploitation of event log data is challenging, mainly as: 1) event labels are not consistent across manufacturers, 2) assets report vast amount of data from which only a small part may be relevant, 3) textual event logs and numerical sensor data are usually processed by methods dedicated respectively to textual data or sensor data, methods combining both types of data are still missing, 4) industrial data are rarely labelled, i.e. there is no indication on the actual performance of the asset and it has to be derived from other sources, 5) the meaning of an event may vary depending on the events send after or before.Concretely, this thesis is concerned with the conception and validation of an integrated data processing framework for scalable performance assessment of industrial asset portfolios. This framework is composed of several advanced methodologies facilitating exploitation of both event logs and time series sensor data: 1) an ontology model describing photovoltaic (the validation domain) event system allowing the integration of heterogeneous event generated by various manufacturers; 2) a novel and computationally scalable methodology enabling automatic calculation of event relevancy score without any prior knowledge; 3) a semantically enriched multi-level pattern mining methodology enabling data exploration and hypothesis building across heterogeneous assets; 4) an advanced workflow extracting performance profiles by combining textual event logs and numerical sensor values; 5) a scalable methodology allowing rapid annotation of new asset runs with a known performance label only based on the event logs data.The framework has been exhaustively validated on real-world data from PV plants, provided by our industrial partner 3E. However, the framework has been designed to be domain agnostic and can be adapted to other industrial assets reporting event logs and sensor data.
Doctorat en Sciences de l'ingénieur et technologie
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

7

Orakzai, Faisal Moeen. "Movement Pattern Mining over Large-Scale Datasets." Doctoral thesis, Universite Libre de Bruxelles, 2019. https://dipot.ulb.ac.be/dspace/bitstream/2013/285611/4/TOC.pdf.

Full text

Abstract:

Movement pattern mining involves the processing of movement data to understand the mobility behaviour of humans/animals. Movement pattern mining has numerous applications, e.g. traffic optimization, event planning, optimization of public transport and carpooling. The recent digital revolution has caused a wide-spread use of smartphones and other devices equipped with GPS. These devices produce a tremendous amount of movement data which contains valuable mobility information. Many interesting mobility patterns and algorithms to mine them have been proposed in recent years to mine different types of mobility behaviours, e.g. convoy, flock, group, swarm or platoon, etc. The drastic increase in the volumes of data being generated limits the use of these algorithms in the mining of movement patterns on real-world data sizes because of their lack of scalability.This thesis deals with three aspects of movement pattern mining, i.e. scalability, efficiency, and real-timeliness with a focus on convoy pattern mining. A convoy pattern is a group of objects moving together for a certain period. Mining of convoy pattern involves clustering of the movement dataset at each timestamp and then merging the clusters to form convoys. Clustering the whole dataset is a limiting factor in the scalability of existing algorithms. One way to solve the scalability problem is to mine convoys in parallel. Parallel mining can be done either using the existing distributed spatiotemporal data processing system like Parallel Secondo or by using a general distributed data processing system. We first test the scalability behaviour of Parallel Secondo for mining movement patterns and conclude that it is not an industrial grade system and its scalability is limited. An essential part of designing distributed data processing algorithms is the data partitioning strategy. We study three different data partitioning strategies, i.e. Object-based, spatial and temporal. We analyze their suitability to convoy pattern mining based on 5 properties, i.e. data exchange, data redundancy, partitioning cost, disk seeks and data ordering. Our study shows that the temporal partitioning strategy is best suited for convoy mining as it is easily parallelizable and less complicated. The observations in our study also apply to other movement pattern mining algorithms, e.g. flock, group or platoon, etc.Based on the temporal partitioning strategy, we propose a generic distributed shared nothing convoy mining algorithm called DCM which is linearly scalable concerning the data size, data density and the number of nodes. DCM can be implemented using any distributed data processing framework. For our experiments, we implemented the algorithm using the Hadoop MapReduce framework. It performs better than the existing sequential algorithms, i.e. CuTs family of algorithms by an order of magnitude on different computing architectures, e.g. single x86 machine, multi-core cluster with NUMA architecture and multi-node SMP clusters. Although DCM is a scalable distributed algorithm which can process huge datasets, the cost of maintaining the cluster is high. Also, the heavy computation it incurs because of the requirement of clustering the whole dataset is not resource-efficient.To solve the efficiency problem of DCM, we propose a new sequential algorithm called k/2-hop which even being a sequential algorithm can perform orders of magnitude faster than the existing state-of-the-art sequential as well as distributed algorithms. The main strength of the algorithm is its pruning capability. Our experiments show that it can prune up to 99% of the data. k/2-hop uses a notion of benchmark points which are timestamps separated by k/2 timestamps where k is the minimum length of the convoys to be mined. We prove that to be able to mine maximal convoys; we need to cluster the data belonging to the benchmark points only. For the timestamps between two consecutive benchmark points, we propose an efficient mining algorithm called the Hop Window Mining Tree (HWMT). HWMT clusters the data corresponding to only those objects that are part of a cluster in the benchmark points. k/2-hop is a batch algorithm that can mine convoys very fast, but we only get the result when the complete dataset has been processed. Also, it requires the data to be indexed for better performance and thus cannot be used in real-time scenarios. We propose a streaming variant of the k/2-hop algorithm which does not require the input dataset to be indexed and can process a stream of data. It outputs the mined convoys as and when they are discovered. The streaming k/2-hop algorithm is very memory efficient and can process data that is many times bigger than the memory made available to the algorithm. We show through experiments that if we include the data loading and indexing time in the runtime of the k/2-hop algorithm, streaming k/2-hop is the fastest convoy mining algorithm to date. Convoy pattern is part of a bigger category of co-movement patterns, and most of the observations (if not all) made in this thesis about convoy pattern mining also apply to other patterns of the category such as flock, group or platoon, etc. This applicability means that a generic batch and streaming distributed co-movement pattern mining framework can be build using the k/2 technique.
Doctorat en Sciences de l'ingénieur et technologie
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

8

Jiang, Haotian. "WEARABLE COMPUTING TECHNOLOGIES FOR DISTRIBUTED LEARNING." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1571072941323463.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Jafer, Yasser. "Task Oriented Privacy-preserving (TOP) Technologies Using Automatic Feature Selection." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34320.

Full text

Abstract:

A large amount of digital information collected and stored in datasets creates vast opportunities for knowledge discovery and data mining. These datasets, however, may contain sensitive information about individuals and, therefore, it is imperative to ensure that their privacy is protected. Most research in the area of privacy preserving data publishing does not make any assumptions about an intended analysis task applied on the dataset. In many domains such as healthcare, finance, etc; however, it is possible to identify the analysis task beforehand. Incorporating such knowledge of the ultimate analysis task may improve the quality of the anonymized data while protecting the privacy of individuals. Furthermore, the existing research which consider the ultimate analysis task (e.g., classification) is not suitable for high-dimensional data. We show that automatic feature selection (which is a well-known dimensionality reduction technique) can be utilized in order to consider both aspects of privacy and utility simultaneously. In doing so, we show that feature selection can enhance existing privacy preserving techniques addressing k-anonymity and differential privacy and protect privacy while reducing the amount of modifications applied to the dataset; hence, in most of the cases achieving higher utility. We consider incorporating the concept of privacy-by-design within the feature selection process. We propose techniques that turn filter-based and wrapper-based feature selection into privacy-aware processes. To this end, we build a layer of privacy on top of regular feature selection process and obtain a privacy preserving feature selection that is not only guided by accuracy but also the amount of protected private information. In addition to considering privacy after feature selection we introduce a framework for a privacy-aware feature selection evaluation measure. That is, we incorporate privacy during feature selection and obtain a list of candidate privacy-aware attribute subsets that consider (and satisfy) both efficacy and privacy requirements simultaneously. Finally, we propose a multi-dimensional, privacy-aware evaluation function which incorporates efficacy, privacy, and dimensionality weights and enables the data holder to obtain a best attribute subset according to its preferences.

APA, Harvard, Vancouver, ISO, and other styles

10

Inthasone, Somsack. "Techniques d'extraction de connaissances en biodiversité." Thesis, Nice, 2015. http://www.theses.fr/2015NICE4013/document.

Full text

Abstract:

Les données sur la biodiversité sont généralement représentées et stockées dans différents formats. Cela rend difficile pour les biologistes leur agrégation et leur intégration afin d'identifier et découvrir des connaissances pertinentes dans le but, par exemple, de classer efficacement des spécimens. Nous présentons ici l'entrepôt de données BioKET issu de la consolidation de données hétérogènes de différentes sources. Actuellement, le champ d'application de BioKET concerne la botanique. Sa construction a nécessité, notamment, d'identifier et analyser les ontologies et bases botaniques existantes afin de standardiser et lier les descripteurs utilisés dans BioKET. Nous avons également développé une méthodologie pour la construction de terminologies taxonomiques, ou thésaurus, à partir d'ontologies de plantes et d'informations géo-spatiales faisant autorité. Les données de biodiversité et botanique de quatre fournisseurs majeurs et de deux systèmes d'informations géo-spatiales ont été intégrées dans BioKET. L'utilité d'un tel entrepôt de données a été démontrée par l'application de méthodes d'extraction de modèles de connaissances, basées sur les approches classiques Apriori et de la fermeture de Galois, à des ensembles de données générées à partir de BioKET. En utilisant ces méthodes, des règles d'association et des clusters conceptuels ont été extraits pour l'analyse des statuts de risque de plantes endémiques au Laos et en Asie du Sud-Est. En outre, BioKET est interfacé avec d'autres applications et ressources, tel que l'outil GeoCAT pour l'évaluation géo-spatiale des facteurs de risques, afin de fournir un outil d'analyse performant pour les données de biodiversité
Biodiversity data are generally stored in different formats. This makes it difficult for biologists to combine and integrate them in order to retrieve useful information and discover novel knowledge for the purpose of, for example, efficiently classifying specimens. In this work, we present the BioKET data warehouse which is a consolidation of heterogeneous data stored in different formats and originating from different sources. For the time being, the scope of BioKET is botanical. Its construction required, among others things, to identify and analyze existing botanical ontologies, to standardize and relate terms in BioKET. We also developed a methodology for mapping and defining taxonomic terminologies, that are controlled vocabularies with hierarchical structures from authoritative plant ontologies, Google Maps, and OpenStreetMap geospatial information system. Data from four major biodiversity and botanical data providers and from the two previously mentioned geospatial information systems were then integrated in BioKET. The usefulness of such a data warehouse was demonstrated by applying classical knowledge pattern extraction methods, based on the classical Apriori and Galois closure based approaches, to several datasets generated from BioKET extracts. Using these methods, association rules and conceptual bi-clusters were extracted to analyze the risk status of plants endemic to Laos and Southeast Asia. Besides, BioKET is interfaced with other applications and resources, like the GeoCAT Geospatial Conservation Assessment Tool, to provide a powerful analysis tool for biodiversity data

APA, Harvard, Vancouver, ISO, and other styles

11

Hjalmarsson, Victoria. "Machine learning and Multi-criteria decision analysis in healthcare : A comparison of machine learning algorithms for medical diagnosis." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-33940.

Full text

Abstract:

Medical records consist of a lot of data. Nevertheless, in today’s digitized society it is difficult for humans to convert data into information and recognize hidden patterns. Effective decision support tools can assist medical staff to reveal important information hidden in the vast amount of data and support their medical decisions. The objective of this thesis is to compare five machine learning algorithms for clinical diagnosis. The selected machine learning algorithms are C4.5, Random Forest, Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Naïve Bayes classifier. First, the machine learning algorithms are applied on three publicly available datasets. Next, the Analytic hierarchy process (AHP) is applied to evaluate which algorithms are more suitable than others for medical diagnosis. Evaluation criteria are chosen with respect to typical clinical criteria and were narrowed down to five; sensitivity, specificity, positive predicted value, negative predicted value and interpretability. Given the results, Naïve Bayes and SVM are given the highest AHP-scores indicating they are more suitable than the other tested algorithm as clinical decision support. In most cases kNN performed the worst and also received the lowest AHP-score which makes it the least suitable algorithm as support for medical diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

12

Maaradji, Abderrahmane. "End-user service composition from a social networks analysis perspective." Thesis, Evry, Institut national des télécommunications, 2011. http://www.theses.fr/2011TELE0028/document.

Full text

Abstract:

Le paradigme de service dans les nouvelles technologies de l’information et de communication est omniprésent, si bien qu’on parle de science des services. Les services Web sont définis dans le cadre des architectures orientées services (SOA) qui permet de distinguer le fournisseur de service, le répertoire de services, et enfin le consommateur du service. Cette distinction permet de créer de nouveaux services en composant des services déjà existants. Cependant, la composition de services est principalement bénéfique aux utilisateurs expérimentés comme les développeurs de logiciels car elle requiert un niveau technique élevé. Par opposition, la tendance actuelle traduite par l’émergence du Web2.0, vise à permettre aux utilisateurs du Web de créer leurs propres services à travers les environnements de Mashup, ou de collaborer et de capitaliser des connaissances à travers les réseaux et les médias sociaux. Nous croyons qu’il existe un grand potentiel pour “démocratiser” la composition de services dans de tels contextes. L’émergence du Web 2.0, basé sur des paradigmes tels que le contenu généré par l’utilisateur (UGC, Mashups) et le web social, constitue, une opportunité intéressante pour améliorer la productivité de services par l’utilisateur final et accélérer son processus créatif en capitalisant les connaissances générées par tous les utilisateurs. Dans ce contexte, cette thèse vise à soutenir l'évolution du concept de composition de services par le biais de contributions significatives. La principale contribution de cette thèse est en effet l'introduction de la dimension sociale dans le processus de construction d'un service composite à travers les environnements dédiés aux utilisateurs finaux. Ce concept considère l'activité de composition de services (création d'un Mashup) comme une activité sociale. Cette activité révèle les liens sociaux entre les utilisateurs en fonction de leur similitude dans le choix et la combinaison des services. Ces liens permettent de diffuser d'expertise de composition de services. En d'autres termes, sur la base des schémas fréquents de composition, et la similitude entre les utilisateurs, lorsqu’un utilisateur est en train d’éditer un Mashup, des recommandations dynamiques lui sont proposées. Ces recommandations visent à compléter la première partie de Mashup déjà mis en place par l'utilisateur. Ce concept a été exploré à travers (i) la complétion de Mashup étape par étape en recommandant à chaque étape un service unique, et (ii) la complétion totale de Mashup en recommandant la séquence complète de services qui pourraient le compléter. Au-delà de l’introduction de la dimension sociale dans le processus de composition de services, cette thèse a adressé une contrainte particulière du système de recommandation liée aux exigences des systèmes interactifs en termes de temps de réponse. À cet égard, nous avons développé des algorithmes robustes et adaptées aux spécificités de notre problème. Alors qu’un service composite est considéré comme une séquence de service, la recherche de similarités entre les utilisateurs revient d'abord à trouver des modèles fréquents, puis de les représenter dans une structure de données avantageuse pour l'algorithme de recommandation. L’algorithme proposé FESMA répond à ces exigences en se basant sur la structure FSTREE et offrant des résultats intéressants par rapport à l'art antérieur. Enfin, pour mettre en œuvre les algorithmes et les méthodes proposées, nous avons développé un environnement de création de Mashup, appelé ‘Social Composer’ (SoCo). Cet environnement, dédié aux utilisateurs finaux, respecte les critères d'utilisation en se basant sur le workflow graphique. En outre, il met en œuvre tous les mécanismes nécessaires pour déployer le service composé à partir d'une description abstraite introduite par l'utilisateur. De plus, SoCo a été augmentée en y incluant la fonctionnalité de recommandation dynamique, démontrant la faisabilité de ce concept
Service composition has risen from the need to make information systems more flexible and open. The Service Oriented Architecture has become the reference architecture model for applications carried by the impetus of Internet (Web). In fact, information systems are able to expose interfaces through the Web which has increased the number of available Web services. On the other hand, with the emergence of the Web 2.0, service composition has evolved toward web users with limited technical skills. Those end-users, named Y generation, are participating, creating, sharing and commenting content through the Web. This evolution in service composition is translated by the reference paradigm of Mashup and Mashup editors such as Yahoo Pipes! This paradigm has established the service composition within end users community enabling them to meet their own needs, for instance by creating applications that do not exist. Additionally, Web 2.0 has brought also its social dimension, allowing users to interact, either directly through the online social networks or indirectly by sharing, modifying content, or adding metadata. In this context, this thesis aims to support the evolving concept of service composition through meaningful contributions. The main contribution of this thesis is indeed the introduction of the social dimension within the process of building a composite service through end users’ dedicated environments. In fact, this concept of social dimension considers the activity of compositing services (creating a Mashup) as a social activity. This activity reveals social links between users based on their similarity in selecting and combining services. These links could be an interesting dissemination means of expertise, accumulated by users when compositing services. In other terms, based on frequent composition patterns, and similarity between users, when a user is editing a Mashup, dynamic recommendations are proposed. These recommendations aim to complete the initial part of Mashup already introduced by the user. This concept has been explored through (i) a step-by-step Mashup completion by recommending a single service at each step, and (ii) a full Mashup completion approaches by recommending the whole sequence of services that could complete the Mashup. Beyond pushing a vision for integrating the social dimension in the service composition process, this thesis has addressed a particular constraint for this recommendation system which conditions the interactive systems requirements in terms of response time. In this regard, we have developed robust algorithms adapted to the specificities of our problem. Whereas a composite service is considered as a sequence of basic service, finding similarities between users comes first to find frequent patterns (subsequences) and then represent them in an advantageous data structure for the recommendation algorithm. The proposed algorithm FESMA, provide exactly those requirements based on the FSTREE structure with interesting results compared to the prior art. Finally, to implement the proposed algorithms and methods, we have developed a Mashup creation framework, called Social Composer (SoCo). This framework, dedicated to end users, firstly implements abstraction and usability requirements through a workflow-based graphic environment. As well, it implements all the mechanisms needed to deploy composed service starting from an abstract description entered by the user. More importantly, SoCo has been augmented by including the dynamic recommendation functionality, demonstrating by the way the feasibility of this concept

APA, Harvard, Vancouver, ISO, and other styles

13

Suárez, Pacios Irene. "Impacts of peer-to-peer rental accommodation in Stockholm, Barcelona and Rio de Janeiro : An exploratory analysis of Airbnb’s data." Thesis, KTH, Maskinkonstruktion (Inst.), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276694.

Full text

Abstract:

As a part of the growing movement called the “peer-to-peer” economy, Airbnb has changed the short-stay rental market and has become one of the world’s largest booking websites for finding an accommodation to stay. The platform has also affected the economy of tourism around the world, so, given the importance of the subject, in this thesis study, the impacts that the Airbnb rental accommodation model has on clients of Stockholm, Barcelona and Rio de Janeiro is studied. In this way, it has been analyzed how factors such as price, location and seasonality affect Airbnb customers in these cities. To do this, the three cities were first analyzed individually and then compared, using data from the Inside Airbnb website from 2010 to now. This research has been carried out through an exploratory analysis using the R programming language. The study has been divided into three parts: First, the Spatial Data Analysis has shown that Airbnb´s presence in all three cities has increased significantly in the past decade, growing from the most touristy parts of the city to surrounding areas. In addition, it has been observed that the largest number of Airbnb properties are apartments located near the city center and touristic places, which also are the most valued areas by Airbnb customers and the most expensive to rent a property. Secondly, a Demand and Price Analysis has been carried out. In this part, the demand for Airbnb listings has been estimated over the years since 2010 and across months. A significant increase in demand has been appreciated in the last decade, which also shows a seasonal pattern. In the three cases, the demand graph follows the city´s climate, showing the highest demand during the summer months, which corresponds to the most expensive period. Finally, through User Review Mining, customer opinion has been studied by applying text mining to reviews. In this part of the research, word clouds have been used to have a visual representation of the text data, showing the most frequent words and analyzing what makes customers feel comfortable and uncomfortable.
I detta examensarbete har effekterna som Airbnbs hyresmodell har på kunder i Stockholm, Barcelona och Rio de Janeiro studerats. På detta sätt har det varit möjligt att analysera hur faktorer som pris, plats och säsongsvaror påverkar Airbnbs kunder i dessa städer. För att göra detta analyserades först de tre städerna individuellt och jämfördes sedan med data från webbplatsen Inside Airbnb från 2010 till nu. Denna forskning har genomförts genom en undersökande analys med programmeringsspråket R. Studien har delats in i tre delar: För det första har den rumsliga dataanalysen visat att Airbnbs närvaro i alla tre städerna har ökat markant under det senaste decenniet och växte från att omfatta de delar av staden som är mest intressanta för turister till omgivande områden. Dessutom har det observerats att det största antalet objekt på Airbnb är lägenheter belägna nära centrum och platser intressanta för turister, som också är de mest värderade områdena av Airbnbs kunder och de som är dyrast att hyra i en fastighet. För det andra har en efterfrågan och prisanalys genomförts. I denna del har efterfrågan på Airbnbs registreringar uppskattats under åren sedan 2010 och över flera månader. En betydande ökning av efterfrågan under det senaste decenniet har uppskattats, vilket också visar ett säsongsmönster. I samtliga tre fall följer efterfrågan förändringarna i stadens klimat och visar den högsta efterfrågan under sommarmånaderna, vilket också motsvarar den dyraste perioden. Slutligen, i avsnittet Användarrecensioner, har återkoppling från kunderna studerats genom att använda textutvinning på recensioner. I denna del av forskningen har ordmoln använts för att få en visuell representation av textdata, som visar de vanligaste orden och analyserar vad som gör att kunderna känner sig bekväma och obekväma.

APA, Harvard, Vancouver, ISO, and other styles

14

Kulhavý, Lukáš. "Praktické uplatnění technologií data mining ve zdravotních pojišťovnách." Master's thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-77726.

Full text

Abstract:

This thesis focuses on data mining technology and its possible practical use in the field of health insurance companies. Thesis defines the term data mining and its relation to the term knowledge discovery in databases. The term data mining is explained, inter alia, with methods describing the individual phases of the process of knowledge discovery in databases (CRISP-DM, SEMMA). There is also information about possible practical applications, technologies and products available in the market (both products available free and commercial products). Introduction of the main data mining methods and specific algorithms (decision trees, association rules, neural networks and other methods) serves as a theoretical introduction, on which are the practical applications of real data in real health insurance companies build. These are applications seeking the causes of increased remittances and churn prediction. I have solved these applications in freely-available systems Weka and LISP-Miner. The objective is to introduce and to prove data mining capabilities over this type of data and to prove capabilities of Weka and LISP-Miner systems in solving tasks due to the methodology CRISP-DM. The last part of thesis is devoted the fields of cloud and grid computing in conjunction with data mining. It offers an insight into possibilities of these technologies and their benefits to the technology of data mining. Possibilities of cloud computing are presented on the Amazon EC2 system, grid computing can be used in Weka Experimenter interface.

APA, Harvard, Vancouver, ISO, and other styles

15

Peroutka, Lukáš. "Návrh a implementace Data Mining modelu v technologii MS SQL Server." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-199081.

Full text

Abstract:

This thesis focuses on design and implementation of a data mining solution with real-world data. The task is analysed, processed and its results evaluated. The mined data set contains study records of students from University of Economics, Prague (VŠE) over the course of past three years. First part of the thesis focuses on theory of data mining, definition of the term, history and development of this particular field. Current best practices and meth-odology are described, as well as methods for determining the quality of data and methods for data pre-processing ahead of the actual data mining task. The most common data mining techniques are introduced, including their basic concepts, advantages and disadvantages. The theoretical basis is then used to implement a concrete data mining solution with educational data. The source data set is described, analysed and some of the data are chosen as input for created models. The solution is based on MS SQL Server data mining platform and it's goal is to find, describe and analyse potential as-sociations and dependencies in data. Results of respective models are evaluated, including their potential added value. Also mentioned are possible extensions and suggestions for further development of the solution.

APA, Harvard, Vancouver, ISO, and other styles

16

Samstad, Anna. "A simulation and machine learning approach to critical infrastructure resilience appraisal : Case study on payment disruptions." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-33745.

Full text

Abstract:

This study uses a simulation to gather data regarding a payment disruption. The simulation is part of a project called CCRAAAFFFTING, which examines what happens to a society when a payment disruption occurs. The purpose of this study is to develop a measure for resilience in the simulation and use machine learning to analyse the attributes in the simulation to see how they affect the resilience in the society. The resilience is defined as “the ability to bounce back to a previous state”, and the resilience measure is developed according to this definition. Two resilience measurements are defined, one which relates the simulated value to the best-case and worst-case scenarios, and the other which takes the pace of change in values into consideration. These two measurements are then combined to one measure of the total resilience. The three machine learning algorithms compared are Neural Network, Support Vector Machine and Random Forest, and the performance measure of these are the error rate. The results show that Random Forest performs significantly better than the other two algorithms, and that the most important attributes in the simulation are those concerning the customers’ ability to make purchases in the simulation. The developed resilience measure proves to respond logically to how the situation unfolded, and some suggestions to further improve the measurement is provided for future research.
I denna studie används en simulering för att samla in data. Simuleringen är en del i ett projekt som kallas för CCRAAAFFFTING, vars syfte är att undersöka vad som händer i ett samhälle om en störning i betalsystemet inträffar. Syftet med denna studie är att utveckla ett mått för resiliens i simuleringen, samt att använda machine learning för att analysera attributen i simuleringen för att se hur de påverkar resiliensen i samhället. Resiliensen definieras enligt ”förmågan att snabbt gå tillbaka till ett tidigare stadie”, och resiliensmåttet utvecklas i enlighet med denna definition. Två resiliensmått definieras, där det ena måttet relaterar det simulerade värdet till de värsta och bästa scenarierna, och det andra måttet tar i beaktning hur snabbt värdena förändrades. Dessa två mått kombineras sedan till ett mått för den totala resiliensen. De tre machine learning-algoritmerna som jämförs är Neural Network, Support Vector Machine och Random Forest, och måttet för hur de presterar är felfrekvens. Resultaten visar att Random Forest presterar märkbart bättre än de andra två algoritmerna, och att de viktigaste attributen i simuleringen är de som berör kunders möjlighet att genomföra köp i simuleringen. Det utvecklade resiliensmåttet svarar på ett logiskt sätt enligt hur situationen utvecklar sig, och några förslag för att vidare utveckla måttet ges för vidare forskning.

APA, Harvard, Vancouver, ISO, and other styles

17

Waldmannová, Lenka. "Využití technologií data mining v rámci interaktivního smlouvání v retailu." Master's thesis, Vysoká škola ekonomická v Praze, 2013. http://www.nusl.cz/ntk/nusl-197474.

Full text

Abstract:

This thesis deals with the issue of data mining technology within interative bargaining in retail with closer focus on the implementation of data models and related rules, which support interactive haggling with the customer. The aim of the thesis is to prepare proposals for data models, calculations and rules that are involved in the haggling process with the customer. Prepared outputs are used in the demo application "Bargaining" and its application is showcased on the demonstrated examples. The work is divided into two main areas -- the theoretical definition of work and the analytical processing. The theory part includes available research work (resources), which will be used to characterise various thematic areas of the work - Customer Intelligence, Customer Lifetime Value, Data mining, Interactive marketing, negotiation techniques and Setting prices. Analytical processing is focused on the practical use of the acquired knowledge. It iis the calculation of selected values, their processing and application within the Bargaining process with the customer. In conclusion, all the results, reaching the goals and the recommendations for the development or modification of the solution, will be assessed. The anticipated benefit of the application is its use in the negotiation between the trader and the customer, without dropping the agreed sales price during realised business below the expenditure costs. It is about ensuring the interaction, where the customer is allowed to negotiate the price of goods but the trader ensures that the price will not result in a loss. The benefit of the solution is the support of the customer satisfaction with regards to the financial interests of the detail industry.

APA, Harvard, Vancouver, ISO, and other styles

18

Misevičiūtė, Vita. "Medicininių duomenų informacijos sistemos, naudojančios objektines technologijas, tyrimas." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2006. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2006~D_20060110_130335-74769.

Full text

Abstract:

Information systems for medical data storing, managing and retrieving are specific, because of the data that has different in-between associations. This data consists of text and multimedia. The purpose of this work is to propose and develop a medical data system, which uses object oriented data base. The user could store various medical data and perform data mining by selecting desirable criterions. After analysis of object-oriented technologies, were chosen object-oriented programming environment and object-oriented data base management system. For its implementation, the suitable technologies were analyzed, according to the specifics of medical data. The advantages of these technologies where shown by an experiment. Developed information system is functional and flexible enough to work with evolving medical data.

APA, Harvard, Vancouver, ISO, and other styles

19

Mamčenko, Jelena. "Duomenų gavybos technologijų taikymas išskirstytų serverių darbui gerinti." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2008~D_20090105_150124-79076.

Full text

Abstract:

Disertacijoje nagrinėjamos šiuolaikiškos duomenų gavybos technologijos serverių našumui gerinti, taikant įvairius duomenų gavybos metodus ir agentines technologijas. Pagrindinis tyrimo objektas – dokumentinių duomenų bazių duomenys ir jų naudojimas išskirstytuose serveriuose.
The main idea is an application of data mining technologies in order to increase distributed servers’ efficiency using data mining methods and agent’s technology. The objects of investigation are data from document based model database and its using by allocatable servers.

APA, Harvard, Vancouver, ISO, and other styles

20

Jurčák, Petr. "Získávání znalostí z multimediálních databází." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236662.

Full text

Abstract:

This master's thesis is dedicated to theme of knowledge discovery in Multimedia Databases, especially basic methods of classification and prediction used for data mining. The other part described about extraction of low level features from video data and images and summarizes information about content-based search in multimedia content and indexing this type of data. Final part is dedicated to implementation Gaussian mixtures model for classification and compare the final result with other method SVM.

APA, Harvard, Vancouver, ISO, and other styles

21

Peoples, Bruce E. "Méthodologie d'analyse du centre de gravité de normes internationales publiées : une démarche innovante de recommandation." Thesis, Paris 8, 2016. http://www.theses.fr/2016PA080023.

Full text

Abstract:

.../
“Standards make a positive contribution to the world we live in. They facilitate trade, spreadknowledge, disseminate innovative advances in technology, and share good management andconformity assessment practices”7. There are a multitude of standard and standard consortiaorganizations producing market relevant standards, specifications, and technical reports in thedomain of Information Communication Technology (ICT). With the number of ICT relatedstandards and specifications numbering in the thousands, it is not readily apparent to users howthese standards inter-relate to form the basis of technical interoperability. There is a need todevelop and document a process to identify how standards inter-relate to form a basis ofinteroperability in multiple contexts; at a general horizontal technology level that covers alldomains, and within specific vertical technology domains and sub-domains. By analyzing whichstandards inter-relate through normative referencing, key standards can be identified as technicalcenters of gravity, allowing identification of specific standards that are required for thesuccessful implementation of standards that normatively reference them, and form a basis forinteroperability across horizontal and vertical technology domains. This Thesis focuses on defining a methodology to analyze ICT standards to identifynormatively referenced standards that form technical centers of gravity utilizing Data Mining(DM) and Social Network Analysis (SNA) graph technologies as a basis of analysis. As a proofof concept, the methodology focuses on the published International Standards (IS) published bythe International Organization of Standards/International Electrotechnical Committee; JointTechnical Committee 1, Sub-committee 36 Learning Education, and Training (ISO/IEC JTC1 SC36). The process is designed to be scalable for larger document sets within ISO/IEC JTC1 that covers all JTC1 Sub-Committees, and possibly other Standard Development Organizations(SDOs).Chapter 1 provides a review of literature of previous standard analysis projects and analysisof components used in this Thesis, such as data mining and graph theory. Identification of adataset for testing the developed methodology containing published International Standardsneeded for analysis and form specific technology domains and sub-domains is the focus ofChapter 2. Chapter 3 describes the specific methodology developed to analyze publishedInternational Standards documents, and to create and analyze the graphs to identify technicalcenters of gravity. Chapter 4 presents analysis of data which identifies technical center of gravitystandards for ICT learning, education, and training standards produced in ISO/IEC JTC1 SC 36.Conclusions of the analysis are contained in Chapter 5. Recommendations for further researchusing the output of the developed methodology are contained in Chapter 6

APA, Harvard, Vancouver, ISO, and other styles

22

Norguet, Jean-Pierre. "Semantic analysis in web usage mining." Doctoral thesis, Universite Libre de Bruxelles, 2006. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210890.

Full text

Abstract:

With the emergence of the Internet and of the World Wide Web, the Web site has become a key communication channel in organizations. To satisfy the objectives of the Web site and of its target audience, adapting the Web site content to the users' expectations has become a major concern. In this context, Web usage mining, a relatively new research area, and Web analytics, a part of Web usage mining that has most emerged in the corporate world, offer many Web communication analysis techniques. These techniques include prediction of the user's behaviour within the site, comparison between expected and actual Web site usage, adjustment of the Web site with respect to the users' interests, and mining and analyzing Web usage data to discover interesting metrics and usage patterns. However, Web usage mining and Web analytics suffer from significant drawbacks when it comes to support the decision-making process at the higher levels in the organization.

Indeed, according to organizations theory, the higher levels in the organizations need summarized and conceptual information to take fast, high-level, and effective decisions. For Web sites, these levels include the organization managers and the Web site chief editors. At these levels, the results produced by Web analytics tools are mostly useless. Indeed, most of these results target Web designers and Web developers. Summary reports like the number of visitors and the number of page views can be of some interest to the organization manager but these results are poor. Finally, page-group and directory hits give the Web site chief editor conceptual results, but these are limited by several problems like page synonymy (several pages contain the same topic), page polysemy (a page contains several topics), page temporality, and page volatility.

Web usage mining research projects on their part have mostly left aside Web analytics and its limitations and have focused on other research paths. Examples of these paths are usage pattern analysis, personalization, system improvement, site structure modification, marketing business intelligence, and usage characterization. A potential contribution to Web analytics can be found in research about reverse clustering analysis, a technique based on self-organizing feature maps. This technique integrates Web usage mining and Web content mining in order to rank the Web site pages according to an original popularity score. However, the algorithm is not scalable and does not answer the page-polysemy, page-synonymy, page-temporality, and page-volatility problems. As a consequence, these approaches fail at delivering summarized and conceptual results.

An interesting attempt to obtain such results has been the Information Scent algorithm, which produces a list of term vectors representing the visitors' needs. These vectors provide a semantic representation of the visitors' needs and can be easily interpreted. Unfortunately, the results suffer from term polysemy and term synonymy, are visit-centric rather than site-centric, and are not scalable to produce. Finally, according to a recent survey, no Web usage mining research project has proposed a satisfying solution to provide site-wide summarized and conceptual audience metrics.

In this dissertation, we present our solution to answer the need for summarized and conceptual audience metrics in Web analytics. We first described several methods for mining the Web pages output by Web servers. These methods include content journaling, script parsing, server monitoring, network monitoring, and client-side mining. These techniques can be used alone or in combination to mine the Web pages output by any Web site. Then, the occurrences of taxonomy terms in these pages can be aggregated to provide concept-based audience metrics. To evaluate the results, we implement a prototype and run a number of test cases with real Web sites.

According to the first experiments with our prototype and SQL Server OLAP Analysis Service, concept-based metrics prove extremely summarized and much more intuitive than page-based metrics. As a consequence, concept-based metrics can be exploited at higher levels in the organization. For example, organization managers can redefine the organization strategy according to the visitors' interests. Concept-based metrics also give an intuitive view of the messages delivered through the Web site and allow to adapt the Web site communication to the organization objectives. The Web site chief editor on his part can interpret the metrics to redefine the publishing orders and redefine the sub-editors' writing tasks. As decisions at higher levels in the organization should be more effective, concept-based metrics should significantly contribute to Web usage mining and Web analytics.

Doctorat en sciences appliquées
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

23

Kříž, Jan. "Business Intelligence řešení pro společnost 1188." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2015. http://www.nusl.cz/ntk/nusl-224859.

Full text

Abstract:

Cílem této diplomové práce je vytvoření Business Intelligence řešení pro společnost 1188. Na základě výsledného Business Intelligence řešení bude umožněno managementu společnosti vykonávat přesnější rozhodnutí, která se budou shodovat se strategií společnosti.

APA, Harvard, Vancouver, ISO, and other styles

24

Saoudi, Massinissa. "Conception d'un réseau de capteurs sans fil pour des prises de décision à base de méthodes du Data Mining." Thesis, Brest, 2017. http://www.theses.fr/2017BRES0065/document.

Full text

Abstract:

Les réseaux de capteurs sans fil (RCSFs) déterminent un axe de recherche en plein essor, puisqu’ils sont utilisés aujourd’hui dans de nombreuses applications qui diffèrent par leurs objectifs et leurs contraintes individuelles.Toutefois, le dénominateur commun de toutes les applications de réseaux de capteurs reste la vulnérabilité des nœuds capteurs en raison de leurs caractéristiques et aussi de la nature des données générées.En effet, les RCSFs génèrent une grande masse de données en continue à des vitesses élevées, hétérogènes et provenant d’emplacements répartis. Par ailleurs, la nécessité de traiter et d’extraire des connaissances à partir de ces grandes quantités de données nous ont motivé à explorer l’une des techniques conçues pour traiter efficacement ces ensembles de données et fournir leurs modèles de représentation. Cependant, parmi les techniques utilisées pour la gestion des données, nous pouvons utiliser les techniques de Data mining. Néanmoins, ces méthodes ne sont pas directement applicables aux RCSFs à cause des contraintes des noeuds capteurs. Il faut donc répondre à un double objectif : l’efficacité d’une solution tout en offrant une bonne adaptation des méthodes de Data mining classiques pour l’analyse de grosses masses de données des RCSFs en prenant en compte les contraintes des noeuds capteurs, et aussi l’extraction du maximum de connaissances afin de prendre des décisions meilleures. Les contributions de cette thèse portent principalement sur l’étude de plusieurs algorithmes distribués qui répondent à la nature des données et aux contraintes de ressources des noeuds capteurs en se basant sur les techniques de Data mining. Chaque noeud favorise un traitement local des techniques de Data mining et ensuite échange ses informations avec ses voisins, pour parvenir à un consensus sur un modèle global. Les différents résultats obtenus montrent que les approches proposées réduisent considérablement la consommation d’énergie et les coûts de consommation, ce qui étend la durée de vie du réseau.Les résultats obtenus indiquent aussi que les approches proposées sont extrêmement efficaces en termes de calcul du modèle, de latence, de réduction de la taille des données, d’adaptabilité et de détection des événements
Recently, Wireless Sensor Networks (WSNs) have emerged as one of the most exciting fields. However, the common challenge of all sensor network applications remains the vulnerability of sensor nodes due to their characteristics and also the nature of the data generated which are of large volume, heterogeneous, and distributed. On the other hand, the need to process and extract knowledge from these large quantities of data motivated us to explore Data mining techniques and develop new approaches to improve the detection accuracy, the quality of information, the reduction of data size, and the extraction of knowledge from WSN datasets to help decision making. However, the classical Data mining methods are not directly applicable to WSNs due to their constraints.It is therefore necessary to satisfy the following objectives: an efficient solution offering a good adaptation of Data mining methods to the analysis of huge and continuously arriving data from WSNs, by taking into account the constraints of the sensor nodes which allows to extract knowledge in order to make better decisions. The contributions of this thesis focus mainly on the study of several distributed algorithms which can deal with the nature of sensed data and the resource constraints of sensor nodes based on the Data mining algorithms by first using the local computation at each node and then exchange messages with its neighbors, in order to reach consensus on a global model. The different results obtained show that the proposed approaches reduce the energy consumption and the communication cost considerably which extends the network lifetime.The results also indicate that the proposed approaches are extremely efficient in terms of model computation, latency, reduction of data size, adaptability, and event detection

APA, Harvard, Vancouver, ISO, and other styles

25

Sammouri, Wissam. "Data mining of temporal sequences for the prediction of infrequent failure events : application on floating train data for predictive maintenance." Thesis, Paris Est, 2014. http://www.theses.fr/2014PEST1041/document.

Full text

Abstract:

De nos jours, afin de répondre aux exigences économiques et sociales, les systèmes de transport ferroviaire ont la nécessité d'être exploités avec un haut niveau de sécurité et de fiabilité. On constate notamment un besoin croissant en termes d'outils de surveillance et d'aide à la maintenance de manière à anticiper les défaillances des composants du matériel roulant ferroviaire. Pour mettre au point de tels outils, les trains commerciaux sont équipés de capteurs intelligents envoyant des informations en temps réel sur l'état de divers sous-systèmes. Ces informations se présentent sous la forme de longues séquences temporelles constituées d'une succession d'événements. Le développement d'outils d'analyse automatique de ces séquences permettra d'identifier des associations significatives entre événements dans un but de prédiction d'événement signant l'apparition de défaillance grave. Cette thèse aborde la problématique de la fouille de séquences temporelles pour la prédiction d'événements rares et s'inscrit dans un contexte global de développement d'outils d'aide à la décision. Nous visons à étudier et développer diverses méthodes pour découvrir les règles d'association entre événements d'une part et à construire des modèles de classification d'autre part. Ces règles et/ou ces classifieurs peuvent ensuite être exploités pour analyser en ligne un flux d'événements entrants dans le but de prédire l'apparition d'événements cibles correspondant à des défaillances. Deux méthodologies sont considérées dans ce travail de thèse: La première est basée sur la recherche des règles d'association, qui est une approche temporelle et une approche à base de reconnaissance de formes. Les principaux défis auxquels est confronté ce travail sont principalement liés à la rareté des événements cibles à prédire, la redondance importante de certains événements et à la présence très fréquente de "bursts". Les résultats obtenus sur des données réelles recueillies par des capteurs embarqués sur une flotte de trains commerciaux permettent de mettre en évidence l'efficacité des approches proposées
In order to meet the mounting social and economic demands, railway operators and manufacturers are striving for a longer availability and a better reliability of railway transportation systems. Commercial trains are being equipped with state-of-the-art onboard intelligent sensors monitoring various subsystems all over the train. These sensors provide real-time flow of data, called floating train data, consisting of georeferenced events, along with their spatial and temporal coordinates. Once ordered with respect to time, these events can be considered as long temporal sequences which can be mined for possible relationships. This has created a neccessity for sequential data mining techniques in order to derive meaningful associations rules or classification models from these data. Once discovered, these rules and models can then be used to perform an on-line analysis of the incoming event stream in order to predict the occurrence of target events, i.e, severe failures that require immediate corrective maintenance actions. The work in this thesis tackles the above mentioned data mining task. We aim to investigate and develop various methodologies to discover association rules and classification models which can help predict rare tilt and traction failures in sequences using past events that are less critical. The investigated techniques constitute two major axes: Association analysis, which is temporal and Classification techniques, which is not temporal. The main challenges confronting the data mining task and increasing its complexity are mainly the rarity of the target events to be predicted in addition to the heavy redundancy of some events and the frequent occurrence of data bursts. The results obtained on real datasets collected from a fleet of trains allows to highlight the effectiveness of the approaches and methodologies used

APA, Harvard, Vancouver, ISO, and other styles

26

Miled, Mahdi. "Ressources et parcours pour l'apprentissage du langage Python : aide à la navigation individualisée dans un hypermédia épistémique à partir de traces." Thesis, Cachan, Ecole normale supérieure, 2014. http://www.theses.fr/2014DENS0045/document.

Full text

Abstract:

Les travaux de recherche de cette thèse concernent principalement l‘aide à la navigation individualisée dans un hypermédia épistémique. Nous disposons d‘un certain nombre de ressources qui peut se formaliser à l‘aide d‘un graphe acyclique orienté (DAG) : le graphe des épistèmes. Après avoir cerné les environnements de ressources et de parcours, les modalités de visualisation et de navigation, de traçage, d‘adaptation et de fouille de données, nous avons présenté une approche consistant à corréler les activités de conception ou d‘édition à celles dédiées à l‘utilisation et la navigation dans les ressources. Cette approche a pour objectif de fournir des mécanismes d‘individualisation de la navigation dans un environnement qui se veut évolutif. Nous avons alors construit des prototypes appropriés pour mettre à l‘épreuve le graphe des épistèmes. L‘un de ces prototypes a été intégré à une plateforme existante. Cet hypermédia épistémique baptisé HiPPY propose des ressources et des parcours portant sur l‘apprentissage du langage Python. Il s‘appuie sur un graphe des épistèmes, une navigation dynamique et un bilan de connaissances personnalisé. Ce prototype a fait l‘objet d‘une expérimentation qui nous a donné la possibilité d‘évaluer les principes introduits et d‘analyser certains usages
This research work mainly concerns means of assistance in individualized navigation through an epistemic hypermedia. We have a number of resources that can be formalized by a directed acyclic graph (DAG) called the graph of epistemes. After identifying resources and pathways environments, methods of visualization and navigation, tracking, adaptation and data mining, we presented an approach correlating activities of design or editing with those dedicated to resources‘ use and navigation. This provides ways of navigation‘s individualization in an environment which aims to be evolutive. Then, we built prototypes to test the graph of epistemes. One of these prototypes was integrated into an existing platform. This epistemic hypermedia called HiPPY provides resources and pathways on Python language. It is based on a graph of epistemes, a dynamic navigation and a personalized knowledge diagnosis. This prototype, which was experimented, gave us the opportunity to evaluate the introduced principles and analyze certain uses

APA, Harvard, Vancouver, ISO, and other styles

27

Medlej, Maguy. "Big data management for periodic wireless sensor networks." Thesis, Besançon, 2014. http://www.theses.fr/2014BESA2029/document.

Full text

Abstract:

Les recherches présentées dans ce mémoire s’inscrivent dans le cadre des réseaux decapteurs périodiques. Elles portent sur l’étude et la mise en oeuvre d’algorithmes et de protocolesdistribués dédiés à la gestion de données volumineuses, en particulier : la collecte, l’agrégation etla fouille de données. L’approche de la collecte de données permet à chaque noeud d’adapter sontaux d’échantillonnage à l’évolution dynamique de l’environnement. Par ce modèle le suréchantillonnageest réduit et par conséquent la quantité d’énergie consommée. Elle est basée surl’étude de la dépendance de la variance de mesures captées pendant une même période voirpendant plusieurs périodes différentes. Ensuite, pour sauvegarder plus de l’énergie, un modèled’adpatation de vitesse de collecte de données est étudié. Ce modèle est basé sur les courbes debézier en tenant compte des exigences des applications. Dans un second lieu, nous étudions unetechnique pour la réduction de la taille de données massive qui est l’agrégation de données. Lebut est d’identifier tous les noeuds voisins qui génèrent des séries de données similaires. Cetteméthode est basée sur les fonctions de similarité entre les ensembles de mesures et un modèle defiltrage par fréquence. La troisième partie est consacrée à la fouille de données. Nous proposonsune adaptation de l’approche k-means clustering pour classifier les données en clusters similaires,d’une manière à l’appliquer juste sur les préfixes des séries de mesures au lieu de l’appliquer auxséries complètes. Enfin, toutes les approches proposées ont fait l’objet d’études de performancesapprofondies au travers de simulation (OMNeT++) et comparées aux approches existantes dans lalittérature
This thesis proposes novel big data management techniques for periodic sensor networksembracing the limitations imposed by wsn and the nature of sensor data. First, we proposed anadaptive sampling approach for periodic data collection allowing each sensor node to adapt itssampling rates to the physical changing dynamics. It is based on the dependence of conditionalvariance of measurements over time. Then, we propose a multiple level activity model that usesbehavioral functions modeled by modified Bezier curves to define application classes and allowfor sampling adaptive rate. Moving forward, we shift gears to address the periodic dataaggregation on the level of sensor node data. For this purpose, we introduced two tree-based bilevelperiodic data aggregation techniques for periodic sensor networks. The first one look on aperiodic basis at each data measured at the first tier then, clean it periodically while conservingthe number of occurrences of each measure captured. Secondly, data aggregation is performedbetween groups of nodes on the level of the aggregator while preserving the quality of theinformation. We proposed a new data aggregation approach aiming to identify near duplicatenodes that generate similar sets of collected data in periodic applications. We suggested the prefixfiltering approach to optimize the computation of similarity values and we defined a new filteringtechnique based on the quality of information to overcome the data latency challenge. Last butnot least, we propose a new data mining method depending on the existing K-means clusteringalgorithm to mine the aggregated data and overcome the high computational cost. We developeda new multilevel optimized version of « k-means » based on prefix filtering technique. At the end,all the proposed approaches for data management in periodic sensor networks are validatedthrough simulation results based on real data generated by periodic wireless sensor network

APA, Harvard, Vancouver, ISO, and other styles

28

König, Hampus. "Evaluation of detector Mini-EUSO to study Ultra High-Energy Cosmic Rays and Ultra Violet light emissions observing from the International Space Station." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-72552.

Full text

Abstract:

Under the name EUSO, or Extreme Universe Space Observatory, are multiple instruments where some are currently under design or construction and others have concluded their mission. The main goal they have in common is to detect and analyse cosmic rays with very high energies by using the Earth's atmosphere as a detector. One instrument is called Mini-EUSO, will be placed on the international space station during 2019, and its engineering model is currently being used to collect data and test the function of different components. The engineering model differ from the full scale instrument, and it is also possible to use it for other purposes as well. In this thesis, some of the collected data is used to analyse and compare the engineering models specification to the full Mini-EUSO instrument, with focus on field of view, inert areas on the sensor and its general function. Objects, such as stars, meteors and satellites were also detected, and used in the tests. In addition a short test regarding the possibility to use the instrument to detect plastic residing in the ocean is made, by utilizing fluorescent properties of the plastics. The thesis came to the conclusion that some adjustments needed to be made on the engineering model, but also that the specifications of it was within expected ranges. Several of the objects found can also be used to improve detection algorithms. In addition, the preliminary tests regarding plastic detection in the ocean, have positive results.

APA, Harvard, Vancouver, ISO, and other styles

29

Bournez, Colin. "Conception d'un logiciel pour la recherche de nouvelles molécules bioactives." Thesis, Orléans, 2019. http://www.theses.fr/2019ORLE3043.

Full text

Abstract:

La famille des protéines kinases est impliquée dans plusieurs processus de contrôle des cellules, comme la division ou la signalisation cellulaire. Elle est souvent associée à des pathologies graves, dont le cancer, et représente ainsi une famille de cibles thérapeutiques importantes en chimie médicinale. A l’heure actuelle, il est difficile de concevoir des inhibiteurs de protéines kinases novateurs, notamment par manque de sélectivité du fait de la grande similarité existant entre les sites actifs de ces protéines. Une méthode expérimentale ayant fait ses preuves et aujourd’hui largement utilisée dans la conception de composés innovants est l’approche par fragments. Nous avons donc développé notre propre logiciel, Frags2Drugs, qui utilise cette approche pour construire des molécules bioactives. Frags2Drugs repose sur les données expérimentales disponibles publiquement, plus particulièrement sur les structures des ligands co-cristallisés avec des protéines kinases. Nous avons tout d’abord élaboré une méthode de fragmentation de ces ligands afin d’obtenir une librairie de plusieurs milliers de fragments tridimensionnels. Cette librairie est alors stockée sous la forme d'un graphe où chaque fragment est modélisé par un nœud et chaque relation entre deux fragments, représentant une liaison chimique possible entre eux, par une arête. Nous avons ensuite développé un algorithme permettant de calculer toutes les combinaisons possibles de tous les fragments disponibles, et ce directement dans le site actif de la cible. Notre programme Frags2Drugs peut créer rapidement des milliers de molécules à partir d’un fragment initial défini par l’utilisateur. De plus, de nombreuses méthodes ont été implémentées pour filtrer les résultats afin de ne conserver que les composés les plus prometteurs. Le logiciel a été validé sur trois protéines kinases impliquées dans différents cancers. Les molécules proposées ont ensuite été synthétisées et ont montré d’excellentes activités in vitro
Kinases belong to a family of proteins greatly involved in several aspects of cell control including division or signaling. They are often associated with serious pathologies such as cancer. Therefore, they represent important therapeutic targets in medicinal chemistry. Currently, it has become difficult to design new innovative kinase inhibitors, particularly since the active site of these proteins share a great similarity causing selectivity issues. One of the main used experimental method is fragment-based drug design. Thus, we developed our own software, Frags2Drugs, which uses this approach to build bioactive molecules. Frags2Drugs relies on publicly available experimental data, especially co-crystallized ligands bound to protein kinase structure. We first developed a new fragmentation method to acquire our library composed of thousands of three-dimensional fragments. Our library is then stored as a graph object where each fragment corresponds to a node and each relation, representing a possible chemical bond between fragments, to a link between the two concerned nodes. We have afterwards developed an algorithm to calculate all possible combinations between each available fragment, directly in the binding site of the target. Our program Frags2Drugs can quickly create thousands of molecules from an initial user-defined fragment (the seed). In addition, many methods for filtering the results, in order to retain only the most promising compounds, were also implemented. The software were validated on three protein kinases involved in different cancers. The proposed molecules were then synthesized and show excellent in vitro activity

APA, Harvard, Vancouver, ISO, and other styles

30

Agier, Marie. "De l'analyse de données d'expression à la reconstruction de réseau de gènes." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2006. http://tel.archives-ouvertes.fr/tel-00717382.

Full text

Abstract:

Le premier aspect de ce travail concerne le pré-traitement et l'analyse de données d'expression dans le cadre de deux principaux projets dont l'objectif global est d'améliorer le diagnostic et le pronostic du cancer du sein et de mieux cibler les traitements. Un processus de pré-traitement des données d'expression optimisé a été mis en place et plusieurs analyses ont été réalisées et ont permis d'identifier des gènes permettant de mettre en evidence un profil d'expression caractéristique du statut ganglionnaire des patientes et de la réponse thérapeutique à un traitement chimiothérapeutique particulier, le doxétaxel. Une nouvelle technique de reconstruction de réseaux de gènes basée sur la notion de règles entre gènes a ensuite été proposée, l'idée étant d'offrir aux biologistes la possibilité de choisir parmi plusieurs sémantiques, le sens des règles qu'ils souhaitent générer. L'originalité de ce travail est de proposer un cadre global pouvant inclure un grand nombre de sémantiques pour les règles et d'utiliser des méthodes identiques de génération, de post-traitement et de visualisation pour toutes les sémantiques proposées. La notion de sémantiques bien-formées i.e. pour lesquelles les axiomes d'Armstrong sont justes et complets, est introduite. Un résultat est également donné permettant de savoir simplement si une sémantique est ou non bien-formée. Une visualisation des règles sous forme globaux i.e. incluant plusieurs sémantiques est proposée. Finalement, cette approche a été développée sous forme d'un module appelé RG intégré à un logiciel d'analyse de données d'expression de gènes, le logiciel MeV de TIGR. Ce travail s'inscrit dans le contexte émergent de la fouille de données d'expression de gènes. Elle s'est déroulée, dans le cadre d'un contrat CIFRE au sein de trois structures : la société Diagnogène, le LIMOS et le Centre de Lutte contre le Cancer de la Région Auvergne

APA, Harvard, Vancouver, ISO, and other styles

31

Meza, Fernandez Sandra. "Enseigner et apprendre en ligne : vers un modèle de la navigation sur des sites Web de formation universitaire." Phd thesis, Université de Strasbourg, 2013. http://tel.archives-ouvertes.fr/tel-00974481.

Full text

Abstract:

Cette thèse propose de cartographier le parcours de navigation des usagers des EIAH pour le visualiser, visualiser pour interpréter et interpréter pour anticiper. Les profils d'apprentissage ont une influence sur les modes de navigation dans un environnement d'apprentissage en ligne. S'appuyant sur une méthodologie capable de modéliser le parcours de navigation d'un usager et d'anticiper son prochain clic sur une plateforme, notre étude cherche à élargir le champ des connaissances de l'efficacité/performance des styles d'apprentissage. La méthodologie utilisée repose sur l'analyse des traces d'utilisation élaborée à partir de 63 archives logs Web, incluant 4637 lignes de registre et 13 206 possibilités de choix de module. Le travail de recherche s'inscrit dans le cadre d'approches associant sémiologie, des sciences de l'information, psychologie cognitive et sciences de l'éducation. Trois observations ont été menées, générant des informations sur le profil de l'usager, la représentation des parcours et l'impact du style d'apprentissage dans le choix des fonctionnalités de travail offertes disponibles sur la plateforme. Les principaux résultats sont de deux types : d'une part, l'élaboration d'un outil convertissant les traces des fichiers log en parcours de navigation, et d'autre part, la confirmation d'un lien entre style d'apprentissage et mode de navigation. Ce deuxième résultat permet d'élaborer une méthode d'anticipation du nouveau choix de module sur une plateforme numérique de travail. Les applications pratiques visant à rendre exploitables ces traces dans les formations universitaires sont l'élaboration de bilans de qualité (ressources préférées, fonctionnalités moins utilisées) et l'identification des besoins de médiation pédagogique pour la compréhension de la tâche ou du processus (identifié par exemple dans l'insistance sur le module de consignes, le temps investi par un groupe ou par des trajets répétés). Cette thèse s'adresse principalement aux responsables pédagogiques universitaires décideurs de l'intégration des TIC, et par extension, aux étudiants universitaires et aux concepteurs d'outils d'apprentissage.

APA, Harvard, Vancouver, ISO, and other styles

32

Wu, Yu-Shan, and 吳郁珊. "A Study Incorporating Data Mining Technologies into Data Classification." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/y3k8gc.

Full text

Abstract:

碩士
國立虎尾科技大學
工業工程與管理研究所
98
Support Vector Machines (SVM) has been the most commonly used classification method in recent years. Its main theory originated from Structural Risk Minimization (SRM), a new-generation learning algorithm based on statistical learning theories. These algorithms are currently applied in various fields, including bioinformatics, image analysis, handwriting recognition, daily life anomaly analysis, credit card fraud, and surveillance video detection. Classification through SVM is more accurate and more stable than maximum likelihood estimation (MLE), and does not have frequent inconsistencies like MLE. SVM is also more effective in terms of image segmentation. Thus, this study used DM to incorporate SVM for classification, and used Bayesian Networks (BN) and Decision Trees (DT) to analyze 4 UCI (University of California – Irvine) databases and compared the results with past studies. Results showed that the integration of SVM and DT improved the accuracy rate of classification. Thus, the use of this method to establish a classification system is valid.

APA, Harvard, Vancouver, ISO, and other styles

33

Hsu, Wei-chen, and 許維宸. "Applications of Clustering Technologies on Fuzzy Data Mining." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/26014267474588605236.

Full text

Abstract:

碩士
國立臺北科技大學
生產系統工程與管理研究所
89
It is a modern trend for an enterprise to use computers in every business process. The result is that huge amount of enterprise data is collected by computers. Data have to be analyzed effectively so that useful enterprise knowledge can be retrieved and utilized. But past technologies cannot serve for this purpose. Data mining is a new technology aiming at transforming the raw data into valuable information. In the process, different domain experts are needed to provide different information. For most of fuzzy data mining researches, the fuzzy membership function needs to be provided by the domain experts. In this thesis, three approaches are provided to assist in deriving the fuzzy membership functions. We use the two-dimensional and single-dimensional SOM (self-organizing map) neural networks, and a combination of the SOM network and the K-means method to determine the appropriate number of groups for data attributes. When the group centers that are the appropriate number of groups for data attributes are decided, these centers are used to construct the triangle fuzzy membership functions. Next, the fuzzy association rule algorithm is used to retrieve the fuzzy customer behavior knowledge. In the process, the support and confidence values are used to filter out the noise values and unimportant attributes. Experiments are performed to evaluate all the approaches. Raw data from a library are examined and the fuzzy customer behavior knowledge is retrieved.

APA, Harvard, Vancouver, ISO, and other styles

34

Chiang, Ming-Wei, and 蔣明為. "Digital Rights Protection Method Based on Data Mining Technologies." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/46226198721741419012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

WANG, CHI-HUNG, and 王志宏. "A Study of Insurers Insolvencies with Data Mining Technologies." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/ts734u.

Full text

Abstract:

碩士
東吳大學
財務工程與精算數學系
102
The solvency and financial security of insurance industry involve public benefits. The solvency of an insurance company is significant important for policy-holders, policy beneficiaries, insurance company and investors. This thesis first introduces official regulation of Commissioner of Insurance of the United States of America, Australia and Europe, and provides models to study the solvency of an insurance company. Two of Data Mining technologies: Neural Network and Decision tree are used in this study to investigate the critical factors affect the solvency of insurance industry. The result also found that Neural Network technology is better than Decision tree technology in the empirical models.

APA, Harvard, Vancouver, ISO, and other styles

36

Chiu, Chia-Hsien, and 邱家賢. "Application of Data Mining Technologies for IC Stock Category." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/47854160507221949821.

Full text

Abstract:

碩士
華梵大學
資訊管理學系碩士班
97
In this research, four different types of data mining technologies are used to the IC stock category. The company’s financial reports are used as the basic data. It is to find a suitable data mining technology that will preciously categorize the stocks as positive or negative reward. The four tested data mining technologies are support vector machine (SVM), support vector Machine added with genetic algorithm (GA-SVM), decision tree and back-propagation network (BPN). Based on simulation results, the GA-SVM with feature selection outperforms other approaches.

APA, Harvard, Vancouver, ISO, and other styles

37

Liao, Zhen-Yu, and 廖振宇. "Applying Data Mining Technologies for Bus Transfer Strategies Evaluation." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/38241458767847250422.

Full text

Abstract:

碩士
淡江大學
運輸管理學系碩士班
103
Since the opening of the Taipei MRT, transfer discounts were given to users by the government so as to help boost up the use of public transport. With the Easy card and the transfer concessions, the usage of public transport has grown year by year. In the past, two-way study has been done to investigate the MRT and bus transfer concessions, but rarely for the same is done to explore the transfer between public transport buses. Assuming that public transport is the future-oriented vision, and having the goal of increasing the transfer between the public transportation, this study focuses on doing research on the transfer concessions between buses. In this study, we use the easy card database as the foundation, together with the bus route data library to do the data mining. We explored the characteristics of a short bus ride Travelodge. Through factor analysis, it is shown that passengers’ bus ride routes and the fares are the main factors to whether people will take public transport. Using price elasticity analysis, it is shown that the elasticity is 0.60; therefore showing that the transfer discount does not affect passenger’s decision. Rather, it serves as remedies for the inconvenience caused by the bus transfers. Following the result of data mining, we make some scenarios analysis. We tested three scenarios--overall, grid bus network, and passenger category. In the scenarios of grid bus network transfer fare discount, it’s necessary to transfer another routes when passengers took community routes. So provide the discount fare can cover the inconvenience caused by the bus transfer. It encourage passengers use public transportation more. For main routes, it’s hard for cover the inconvenience caused by the bus transfer. So they need subsidy. This study provides the possibility of transferring between buses fare discount, and it may take the model of the public transportation policy and transfer fare discount policy.

APA, Harvard, Vancouver, ISO, and other styles

38

HSIAO, SHU-HAN, and 蕭書涵. "A study on applying data mining technologies for recruitments." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/20391276165678129672.

Full text

Abstract:

碩士
輔仁大學
管理學研究所
96
There are many ways for recruitments. Interview is one of the most popular methods for recruitment. Before interview, the first step is to sift the information from the resume carefully to find right person for every specific position. It is difficult to do the filtering without advisable tools. In the research, we apply data mining techniques for recruitments, especially aim at filtering resumes. The techniques including discriminant analysis, multivariate adaptive regression splines (MARS), classification ,regression tree (CART) and artificial neural networks (ANN) are adopt to build classification models using every variables in the resume that may influence the performance of employees. The cross validation process is used in each classification model to understand the relations between the information in the resume and performance of employee. Experimental results showed that the data mining techniques can help companies to recruit right employees which produce higher performance.

APA, Harvard, Vancouver, ISO, and other styles

39

Liang, Chih-Jen, and 梁至仁. "Applying Data Mining Technologies to Automotive Diagnostics and Maintenance." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/15440394361277062096.

Full text

Abstract:

碩士
國立臺灣海洋大學
電機工程學系
98
Today, the fleet management system provides numerous services, the purpose of the research is based on the recharge services and the concept of applications behind this system via monitoring the vehicles driving information and applying the data mining technologies. Using the oil consumption to be the principle to find standards for vehicle’s energy saving and diagnosis of vehicle’s condition for maintenance. In this thesis, we are going to perform a web-base chart analysis system to monitor the vehicle driving Information through applying the oil consumption analysis and data mining technologies. The result not only can provide drivers to amend their driving behaviour and conclusions of the vehicle’s diagnosis for maintenance, but also can apply statistic regression analysis to be the mechanism for prediction. Therefore, this system fulfils concrete oil consumption analysis and the feature of prediction. It can be the basic platform for oil consumption analysis and diagnosis of vehicle’s condition for maintenance.

APA, Harvard, Vancouver, ISO, and other styles

40

CHUNG, LING-LING, and 鍾玲玲. "The Construction of Text Mining and Data Mining Technologies for Forecasting Endometrial Cancer." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/83994889789596101910.

Full text

Abstract:

碩士
國立中正大學
醫療資訊管理研究所
105
The incidence rate of endometrial cancer is the fastest growing cancer and most common gynecological cancer in the last decade. Diagnostic methods and technological improvement establish an organized and systematic approach , it could detect the early cancer and make greater progress in cancer prevention and control. Patient ' s diagnosis and health data storage transfer the traditional paper-based medical records into electronic medical records and serves as the main source of medical information in clinical applications, medical education and investigation . Objectives: Data mining technology has been widely applied in various medical research and the key points from the data could be applied to medical decision making. Therefore, this study aims at (a) the use of Text mining technology to explore the impact of endometrial cancer-related factors. (b) To establish the forecasting endometrial cancer risk model and risk index by using Data mining technology. Methods: In this study, 890 cases of endometrial biopsy were collected from a Regional Teaching Hospital in Chiayi City from 2006 to 2015. Among them,148 cases with ICD-9 code【182】of endometrial carcinoma were the case study. The forecasting model of endometrial cancer was constructed by Decision tree, Support vector machines and Logistic regression. The best performance classification model was evaluated by the performance index. Results：The average accuracy prediction rates of endometrial cancer are as below: Support vector machines model is 96.9%, Logistic regression mode is 95.80% and Decision tree model is 91.80%, meanwhile, generalize the risk tree of endometrial cancer. Conclusion: In this study, we used the records of medical institutions, including: patient’s complaints, physical examination findings, ultrasonography, pathological reports, etc,. By editing, organization and analysis process of Text mining, furthermore established the forecasting model for clinic medicine by the exploit of Data mining .It could make up for the inadequacies of the general statistical analysis and reveal the association between medical records and endometrial cancer, providing clinicians assistance in the assessment of patients as a reference.

APA, Harvard, Vancouver, ISO, and other styles

41

Chen, Hong-Bin, and 陳鴻斌. "Using Data Mining Technologies to Construct Query Websites of Diseases." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/65716152295711550969.

Full text

Abstract:

碩士
南台科技大學
資訊管理系
92
In this thesis, we develop two clustering methods to discover the most possibly caused diseases of a patient''s symptoms and the most possibly showed symptoms of a disease from patients’ diagnosis data, respectively. Then we construct a website according the methods. Users can input symptoms to query the most possibly caused diseases of the symptoms, or input a disease to query the most possibly showed symptoms of the disease. The results can provide very useful information to diagnose for doctors, and to keep health care for the people oneself. Moreover, we propose a Boolean algorithm to improve the performance of the previous.

APA, Harvard, Vancouver, ISO, and other styles

42

Tezng, YungSen, and 曾勇森. "Using Data Mining Technologies to Improve Service Perfromance of Library." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/83069478970804472661.

Full text

Abstract:

碩士
南台科技大學
資訊管理系
91
In this thesis, we use two mining methods and two ways to discover the most adaptive readers of one book, and adaptive books of readers. With cluster method. We use cluster to find the readers whose best fit the special book. We use records of readers in the database of library, and according age, academic and department to criterion which cluster is adaptive for one reader. Then, we appoint a book, and research a cluster which book best fit. Finally, find the readers whose never borrow the book which we appointed. The readers are most adaptive readers of one book. Another way. We use the same method to find the most adaptive books of one reader. Like before process, base on data of readers. We clustering all reader to several clusters. Then, research the books of a special cluster, and find the readers which haven’t the record of those books in this cluster. We can identify those books are the those reader need. In the other method, we use sequential pattern method to discover the most adaptive readers of one book, and discover the readers’ most fit books. First, base on records in the library’s database. The readers’ number is primary sort key, and the records’ date is slavery sort key. We can find a reader’s reading sequence. Then, according the minimal support to judge all itemsets which is combination by reader’s borrowed books. Finally, delete the subsequence of maximal sequence itemsets which had satisfying minimal support. If a book which we appointed is in those maximal sequence itemsets. We can use the borrowing sequence to research the best fit reader. Another way, we user the same method and process to discover the adaptive books of readers.

APA, Harvard, Vancouver, ISO, and other styles

43

chi, Gu fang, and 古芳綺. "Litigation Risk Warning Model： The Application of Data Mining Technologies." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/88427020911484553185.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Jeng, Yang-Ting, and 鄭仰廷. "Applying Data Mining Technologies to Diagnosis of Automotive Engines Efficiency." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/89722503483179861724.

Full text

Abstract:

碩士
國立臺灣海洋大學
電機工程學系
99
In the conventional vehicle diagnostics, a skilled technician determined the abnormality or noise of the certain parts based on his experience. For examples : a cylinder misfire, the brake pads are too thin, the noise caused by the worn bearings of tires and so on. However, a well experienced and skilled technician is not always available. With the rapid development of vehicle electronics, vehicles have become the complex of high technology. And the computer technology is updated rapidly, if it could be used on the vehicle diagnosis system, we could use the diagnostic system to monitor, record and collect a variety of data in the vehicle operation, and stored it in the database. Therefore, how to complete the vehicle health diagnosis by vehicle electronic and computer technology is the main topic of this thesis to be discussed. For this topic, this thesis applied the Grey Relational Analysis (GRA) in the health diagnosis of vehicles and infer the failure symptom of vehicles. And it will form a complete and efficient fault diagnosis system. Furthermore, this thesis proposes a modified formula by Data Mining technology to modify the formula of predictive fuel consumption proposed in the " AVR-Based Fuel Consumption Gauge ". And there are smaller error value and more accurate predictive value in this modified formula.

APA, Harvard, Vancouver, ISO, and other styles

45

Lin, Jun-Gu, and 林俊谷. "Applying Data Mining Technologies and RFID to the Fingerprint Identification System." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/73102886066249699130.

Full text

Abstract:

碩士
華梵大學
資訊管理學系碩士班
98
With the maturity of technology and skills, more and more mobile devices are capable of having loads of data. Therefore, protecting personal data has become a greater issue. Moreover, because the importance of the secure identity management has been gradually valued, the biometric identification determining has begun to also draw people’s attention. Especially the Fingerprint Identification System is widely used. This thesis uses binarization, thinning, and flow identification from the fingerprint image captured technology to identify fingerprint treats like using Radio Frequency Identification (RFID) to preserve the unity of the fingerprint. The RFID has those features like long-distance, wide-angle, heavy duty to magnify the accuracy in identifying the fingerprint. The thesis uses the techniques of data mining such as Decision Tree (DT), Particle Swarm Optimization with Decision Tree (PSO+DT) to test the accuracy of the system. From simulation results, it shows that PSO+DT has better identification ability than DT.

APA, Harvard, Vancouver, ISO, and other styles

46

LIN, WEI-CHIH, and 林瑋智. "Application of Data Mining and RFID Technologies to Shopping Paths and Behavior Analysis." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/76588997370844853294.

Full text

Abstract:

碩士
國立中興大學
資訊科學系所
95
At present, Data Mining technology can be used in large-scale shopping centers or department stores to help with commodity trading analysis. With the developments of the market of online shopping, online shopping is one of the most active internet activities, therefore Data Mining and Web Mining technology can also using by online shopping entrepreneurs to analyze website data, analyze user browsing behavior, as well as the analyze network transaction. The source of Web Mining Technology is the customer’s browsing record, which can be found in diary log file, but trading record can’t be found in diary log file. If using Web Mining technology to analyze customer’s purchase behaviors, the most major problem is how to collect the walking path record of custom walking tour through the shopping mall. Therefore the purpose of this research is collects the walking path record of custom walking tour through the shopping mall by using RFID technology. In current situation, the budget will comes expensive if developing an RFID environment for collects the walking path record of custom walking tour through the shopping mall. Therefore there is only way to reduce the budget, which is use simulation to conducts this research. So this research may realistic simulation of shopping mall’s environment and product the data of need by using Data Generation. Firstly, use Dataset Generator to produce the simulation data, and then use Customer Access Matrix (UAM) to get the user’s Preferred Shopping Paths, after that Web Transaction Mining (WTM) can be collects the walking path record of customer walking tour through the shopping mall and commodity trade analysis.

APA, Harvard, Vancouver, ISO, and other styles

47

Lin, Hsien-En, and 林賢恩. "Integrating Data Mining and Context-Aware Technologies to Construct the Intelligent Shopping Environment." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/6734p5.

Full text

Abstract:

碩士
國立臺中科技大學
資訊管理系碩士班
102
Take the retail industry as an example. The association rule between consumer demand is identified from analyzing massive transaction records so retailers can come up with marketing strategy to bring out their competitiveness. The Context-Aware Concept is to reduce the gap between users and information system so that the information system actively get to understand users’ context and demand and in return provide users with better experience. In view of the fact that, in recent years, the power of social media brings much concern, this study integrates the concept of Context-Aware with association algorithms and social media to establish the Context-Aware Recommendation System（CARS）. The Simple RSSI Indoor Localization Module （SRILM） locates the user position; integrating SRILM with Apriori Recommendation Module （ARM） provides effective recommended product information. This study develops the system based on actual context. SRILM is one simple positioning method with the advantages of lower costs and easy arrangement compared with other indoor positioning methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Chung, Meng-Chieh, and 鍾孟杰. "Using Data Mining and Multi-classification Technologies to Construct Corporate Financial Distress Prediction Models." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/63333682208160421050.

Full text

Abstract:

碩士
中原大學
資訊管理研究所
102
In recent years, the business environment, with the advent of the information age of globalization while there have been major changes in the overall economic situation more difficult, the likelihood of financial crises have been followed by increases year by year. Corporate investors, companies will be able to continue to operate if they are willing to put money into the capital markets the main reason. The enterprise financial crisis is at stake with the company's most important key point to survive or not, so if the financial crisis early to predict the business will be able to reduce the loss of business and even the general public, so enterprise financial crisis mode gradually developed. Therefore, the establishment of an effective early-warning model of financial crisis, is a current academia and practitioners important issue. Research from the past can be found, data mining models constructed superior to traditional statistical models, among which the decision tree and neural network models of the most popular, in addition, there are scholars of many recent classification model integration, to construct a multi-classifier warning model, also made many achievements in the improvement. However, we believe that this area is still room for further improvement; Based on the above issues, this study will propose a single classifier, multiple classifiers, hybrid classifier other three categories warning model, and use a variety of classification techniques, such as: decision trees, class neural networks, nearest neighbor method, random forests and other methods, combined with data sampling Bagging technology to construct multiple sets of financial crisis early warning model and comprehensive analysis of the predicted effect. In the experimental test environment, we use most of the scholars identified the University of California, Irvine (University of California at Irvine, UCI) database of corporate information, hope that through a more complete model of diversified financial crisis early warning, providing business and academic community based follow-up study.

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, Yu-Song, and 陳裕菘. "The Construction of Text Mining and Data Mining Technologies for Forecasting Exchange Rate-A Case Study of RMB Against NTD." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/29616090326356136392.

Full text

Abstract:

碩士
輔仁大學
統計資訊學系應用統計碩士班
101
As trade liberalization with other countries and rapid development of trade between countries, exchange rate fluctuation became a important thing for profits. Taiwan was an island country and most of resources were needed to make up by international trade, especially in economic development. So, it was very important thing for predicting accurate exchange rate trends. China played a important role in political、economic and many aspects with Taiwan and other countries ,therefore in this study took RMB against NTD for example, and used text mining and data mining technologies to establish predictive models. In this study, we collected news documents as text mining database form October 2012 to March 2013. According to information form text mining, we identified the possible variables that related to changes in the exchange rate. Use correlation analysis and feature selection methods to choose final modeling variables. By Composite Time Series algorithm established short-term and long-term time series model. According to the model results, we found significant relevance with its own exchange rate (RMB against NTU) and TAIEX at the past tree historical value in the short-term predictive value and ARIMA(1,0,1) is a long-term prediction model. In the model evaluation results, we provided a highly accurate model of prediction.

APA, Harvard, Vancouver, ISO, and other styles

50

Chiang, Huei-Ming, and 江慧敏. "The Research on Multiple Approaches to College Entrance via Data Mining and Neural Network Technologies." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/2732ga.

Full text

Abstract:

碩士
朝陽科技大學
資訊管理系碩士班
93
The Joint College Entrance Exam in Taiwan has been replaced by Multiple Approaches to College Entrance, which is meant to provide more choices for students who can choose the most suitable major for themselves based on their own interests and capability. However, due to the too many choices provided, students often feel confused, not knowing what to choose. Besides, many high school graduates are not very sure about their own interests, capabilities or qualifications. The present study, by applying the techniques of Neural Network and Data Mining, aims to develop a methodology of prediction on admission through recommendation and exam-based admission system. With the system of prediction and recommendation, high school graduates may thus save much time and confusion when choosing their ideal majors. The study consists of two main parts. The first part is to explore significant factors that contribute to success or failure in admission through school recommendation, and to further establish a predicting module. The second part focuses on the exam-based admission system with a view to establishing an appropriate and effective mechanism of recommendation for students so as to help them choose among a myriad of departments and majors in university college the best major when taking into account individual interests, capability, family expectations and other social factors. The results of the study show that there are indeed some influential factors that would affect the outcome in the approach of school recommendation. The efficiency of the module of prediction is also affirmed. Through Association Rule, the accuracy of prediction by applying Back Propagation Network can be measured, which is as high as 80%. The recommendation mechanism established in this study by applying Collaborative Filtering Approach is proved to effective with an over-60% of successful recommendation. The results from both parts of the study affirm the feasibility and implementability of the newly established prediction module and recommendation mechanism.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data mining technologies'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles