Dissertations / Theses on the topic 'Discovery from data'

To see the other types of publications on this topic, follow the link: Discovery from data.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Discovery from data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Höppner, Frank. "Knowledge discovery from sequential data." [S.l. : s.n.], 2003. http://deposit.ddb.de/cgi-bin/dokserv?idn=96728421X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Cao, Huiping. "Pattern discovery from spatiotemporal data." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B37381520.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Cao, Huiping, and 曹會萍. "Pattern discovery from spatiotemporal data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B37381520.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Chau, Tom. "Event level pattern discovery in multivariate continuous data." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape15/PQDD_0003/NQ30594.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

El, Sayed Ahmed. "Contributions in knowledge discovery from textual data." Lyon 2, 2008. http://theses.univ-lyon2.fr/documents/lyon2/2008/el-sayed_a.

Full text
Abstract:
This dissertation focuses on two key issues in text mining, namely unsupervised learning and knowledge acquisition. In spite of their relative maturity, both issues still present some major challenges that need to be addressed. First, for unsupervised learning, a well-known, unresolved challenge is to perform clustering with minimal input parameters. One natural way to reach this is to involve validity indices in the clustering process. Although of great interest, validity indices were not extensively explored in the literature, especially when dealing with high-dimensional data like text. Hence, we make three main contributions: (1) an experimental study comparing extensively 8 validity indices; (2) a context-aware method enhancing validity indices usage as stopping criteria; (3) I-CBC, an Incremental version of the CBC (Clustering By Committee) algorithm. Contributions were validated in two real-world applications: document and word clustering. Second, for knowledge acquisition, we face major issues related to ontology learning from text: low recall of the pattern-based approach, low precision of the distributional approach, context-dependency, and ontology evolution. Thus, we propose a new framework for taxonomy learning from text. The proposal is a hybrid approach which has the following advantages over the other approaches: (1) ability to capture more “flexibly” relations in text; (2) concepts better reflecting the context of the target corpus; (3) more reliable decisions during the learning process; (4) and finally evolution of the learned taxonomy without any manual effort, after its incorporation in a core of an information retrieval system
Cette thèse se focalise sur deux problématiques clés liées à la fouille de texte, à savoir : la classification et l'acquisition des connaissances. En dépit de leur relative maturité, ces deux problématiques présentent encore certains défis majeurs qui doivent être soulevés. En premier lieu, pour la classification, un défi bien connu et non résolu consiste à effectuer des classifications avec un minimum de paramètres en entrée. Une façon naturelle de parvenir à cette fin, est d'utiliser les indices de validité dans le processus de classification. Bien qu'ils soient d'un grand intérêt, les indices de validité n'ont pas été largement explorés dans la littérature, en particulier lorsqu'il s'agit de données de grande dimension, comme c'est le cas des données textuelles. Ainsi, concernant ce volet, nous proposons trois principales contributions : (1) une large étude expérimentale comparant huit indices de validité, (2) une méthode basée sur le contexte améliorant l'utilisation des indices de validité en tant que critère d'arrêt, (3) I-CBC, une version incrémentale de l'algorithme flou CBC (classification par comités). Ces contributions ont été validées sur deux applications du monde réel : la classification de documents et de mots. En deuxième lieu, pour l’acquisition des connaissances, nous nous sommes intéressés à des problématiques importantes liées à la construction d’ontologies à partir de texte : le faible rappel des approches basées sur les patrons, la faible précision de l’approche distributionnelle, la dépendance au contexte et l’évolution des ontologies. Nous proposons ainsi, un nouveau cadre pour l’apprentissage d’ontologies à partir du texte. Notre proposition est une approche hybride qui combine les avantages suivants par rapport aux autres approches : (1) la capacité de capturer avec plus de flexibilité des relations dans le texte, (2) des concepts qui traduisent mieux le contexte du corpus considéré, (3) des décisions plus fiables prises durant le processus d’apprentissage à travers la considération et l’inclusion de plusieurs relations sémantiques, et, enfin, (4) l’évolution de l’ontologie apprise sans aucun effort manuel considérable, après son inclusion au coeurd’un système de recherche d’information
APA, Harvard, Vancouver, ISO, and other styles
6

El, Sayed Ahmed Zighed Djamel Abdelkader. "Contributions in knowledge discovery from textual data." Lyon : Université Lumière Lyon 2, 2008. http://theses.univ-lyon2.fr/sdx/theses/lyon2/2008/el-sayed_a.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Wang, Yang. "High-order pattern discovery and analysis of discrete-valued data sets." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/nq22245.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Amado, Vanessa. "Knowledge discovery and data mining from freeway section traffic data." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/5591.

Full text
Abstract:
Thesis (Ph. D.)--University of Missouri-Columbia, 2008.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on June 8, 2009) Vita. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
9

Páircéir, Rónán. "Knowledge discovery from distributed aggregate data in data warehouses and statistical databases." Thesis, University of Ulster, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.274398.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Caruccio, Loredana. "Relaxed functional dependencies: definition, discovery and applications." Doctoral thesis, Universita degli studi di Salerno, 2018. http://hdl.handle.net/10556/3051.

Full text
Abstract:
2016 - 2017
Functional dependencies (FDs) were conceived in the early '70s, and were mainly used to verify database design and assess data quality. However, to solve several issues in emerging application domains, such as the identification of data inconsistencies, patterns of semantically related data, query rewriting, and so forth, it has been necessary to extend the FD definition... [edited by author]
XVI n.s.
APA, Harvard, Vancouver, ISO, and other styles
11

Sun, Feng-Tso. "Nonparametric Discovery of Human Behavior Patterns from Multimodal Data." Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/359.

Full text
Abstract:
Recent advances in sensor technologies and the growing interest in context- aware applications, such as targeted advertising and location-based services, have led to a demand for understanding human behavior patterns from sensor data. People engage in routine behaviors. Automatic routine discovery goes beyond low-level activity recognition such as sitting or standing and analyzes human behaviors at a higher level (e.g., commuting to work). The goal of the research presented in this thesis is to automatically discover high-level semantic human routines from low-level sensor streams. One recent line of research is to mine human routines from sensor data using parametric topic models. The main shortcoming of parametric models is that they assume a fixed, pre-specified parameter regardless of the data. Choosing an appropriate parameter usually requires an inefficient trial-and-error model selection process. Furthermore, it is even more difficult to find optimal parameter values in advance for personalized applications. The research presented in this thesis offers a novel nonparametric framework for human routine discovery that can infer high-level routines without knowing the number of latent low-level activities beforehand. More specifically, the frame-work automatically finds the size of the low-level feature vocabulary from sensor feature vectors at the vocabulary extraction phase. At the routine discovery phase, the framework further automatically selects the appropriate number of latent low-level activities and discovers latent routines. Moreover, we propose a new generative graphical model to incorporate multimodal sensor streams for the human activity discovery task. The hypothesis and approaches presented in this thesis are evaluated on public datasets in two routine domains: two daily-activity datasets and a transportation mode dataset. Experimental results show that our nonparametric framework can automatically learn the appropriate model parameters from multimodal sensor data without any form of manual model selection procedure and can outperform traditional parametric approaches for human routine discovery tasks.
APA, Harvard, Vancouver, ISO, and other styles
12

Minnen, David. "Unsupervised discovery of activity primitives from multivariate sensor data." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24623.

Full text
Abstract:
Thesis (Ph.D.)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Thad Starner; Committee Member: Aaron Bobick; Committee Member: Bernt Schiele; Committee Member: Charles Isbell; Committee Member: Irfan Essa
APA, Harvard, Vancouver, ISO, and other styles
13

Liang, Huishi. "Knowledge Discovery from Smart Meter Data and Its Applications." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/25982.

Full text
Abstract:
Smart meters have experienced a fast development around the globe during the past decade. Millions of smart meters deployed in many utilities have collected a massive amount of fine-grained electricity consumption data, offering opportunities for revealing insights into the energy consumption characteristics of individual customers. Yet to date how to dig useful information from this wealth of data is still far from being fully investigated. This thesis aims to develop methodologies to mine useful information from smart meter data and apply the discovered knowledge to improve the energy efficiency on the customer side and the flexibility on the power system side. The thesis consists of five main parts. The first three parts investigate methodologies of knowledge mining from smart meter data and their applications on energy efficiency and demand response (DR) management, while the following two parts apply the knowledge obtained from the former parts on two planning problems. Chapter 2 develops a whole-house level load profiling framework, in which the residential load profiles are clustered and indexed into a neat bilevel load shape dictionary (LSD) based on the Derivative Dynamic Time Warping (DDTW) elastic dissimilarity measure. To reduce the computational cost, a fast DDTW (FDDTW) is proposed to speed up the DDTW calculation. Based on the generated bilevel LSD, analytic approaches are proposed to extract features from the data indexed by the LSD to reveal useful information of customers’ electricity consumption behaviors, which is further applied to improve load forecasting, tariff design and DR program targeting strategy. Chapter 3 uses smart meter data to develop a scalable methodology for targeting residential customers for energy efficiency (EE) programs that focus on reducing unnecessary domestic energy consumption and replacing low efficient refrigerator-freezers. A novel method is proposed to detect always-on load (i.e., power constantly consumed by appliances that are never turned off) segments from daily load profiles of residential customers. Based on the always-on load detection results, indices and analytic approaches are proposed to identify customers with high potentials for always-on load energy saving and low efficient refrigerator-freezers, which can inform customers targeting for EE programs. Chapter 4 proposes a novel methodology to extract the electricity usage of heating, ventilation, and air conditioning (HVAC) from smart meter data for individual customers. Given the smart meter data of a household and the outdoor temperature data, the proposed algorithm can not only reconstruct the HVAC usage profiles but also provide estimations on the probability of the HVAC usage and HVAC’s feature parameters. Based on the disaggregation results, metrics are derived to estimate customer’s habits for HVAC usage and the DR potential, which can further inform DR programs to select customers in a more cost-effective way. Chapter 5 applies the outcome of Chapter 4 to develop a data-driven approach for virtual power plant (VPP) resource planning, in which battery energy storage (BES) sizing and DR customer selection are optimized synergistically to maximize VPP’s profit in the electricity market. Heterogeneity in DR potential across individual customers is considered in the planning framework by utilizing the HVAC usage information extracted from smart meter data. The overall VPP resource planning problem is formulated by a risk-managed, multistage stochastic programming framework to address the uncertainties from the intermittent renewable energy sources, load demands, market prices, and DR resources. Case studies demonstrate that jointly optimizing BES and DR customer selection based on the smart meter data mining results can improve VPP’s expected profit under both the penalty-charged and the penalty-free markets. Chapter 6 develops a robust distribution system expansion planning (DSEP) framework incorporating a data-driven model for DR resource, in which the heterogeneity in individual customers’ DR potential is considered by leveraging the HVAC usage information extracted from smart meter data. The relationship between the DR incentive and the DR participation rate is also considered so that differentiated incentives for customers with different DR potentials can be designed in an optimal way. Case studies demonstrate that the proposed DSEP model can substantially reduce the total expansion cost over conventional planning paradigms, highlighting the positive role of the proposed data-driven DR model in the DSEP problem. Overall, this research bridges the gap in smart meter data mining and its applications in the existing literature. The contributions of this thesis lie in two aspects: 1) this research develops several novel data-mining methods to dig useful information from smart meter data, including customers’ electricity usage patterns, always-on load, and HVAC usage information, and 2) metrics and analytic approaches are proposed based on the data-mining results to improve EE and DR management, VPP resource planning and distribution system planning. Numerical experiments on real-world data verify the effectiveness of the proposed methodologies.
APA, Harvard, Vancouver, ISO, and other styles
14

Babbar, Sakshi. "Inferring Anomalies from Data using Bayesian Networks." Thesis, The University of Sydney, 2013. http://hdl.handle.net/2123/9371.

Full text
Abstract:
Existing studies on data mining has largely focused on the design of measures and algorithms to identify outliers in large and high dimensional categorical and numeric databases. However, not much stress has been given on the interestingness of the reported outlier. One way to ascertain interestingness and usefulness of the reported outlier is by making use of domain knowledge. In this thesis, we present measures to discover outliers based on background knowledge, represented by a Bayesian network. Using causal relationships between attributes encoded in the Bayesian framework, we demonstrate that meaningful outliers, i.e., outliers which encode important or new information are those which violate causal relationships encoded in the model. Depending upon nature of data, several approaches are proposed to identify and explain anomalies using Bayesian knowledge. Outliers are often identified as data points which are ``rare'', ''isolated'', or ''far away from their nearest neighbors''. We show that these characteristics may not be an accurate way of describing interesting outliers. Through a critical analysis on several existing outlier detection techniques, we show why there is a mismatch between outliers as entities described by these characteristics and ``real'' outliers as identified using Bayesian approach. We show that the Bayesian approaches presented in this thesis has better accuracy in mining genuine outliers while, keeping a low false positive rate as compared to traditional outlier detection techniques.
APA, Harvard, Vancouver, ISO, and other styles
15

Durbha, Surya Srinivas. "Semantics-enabled framework for knowledge discovery from Earth observation data." Diss., Mississippi State : Mississippi State University, 2006. http://sun.library.msstate.edu/ETD-db/ETD-browse/browse.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Nagao, Katashi, Katsuhiko Kaji, and Toshiyuki Shimizu. "Discussion Mining : Knowledge Discovery from Data on the Real World Activities." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2004. http://hdl.handle.net/2237/10350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Salviato, Elisa. "Computational methods for the discovery of molecular signatures from Omics Data." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3421961.

Full text
Abstract:
Molecular biomarkers, derived from high-throughput technologies, are the foundations of the "next-generation" precision medicine. Despite a decade of intense efforts and investments, the number of clinically valid biomarkers is modest. Indeed, the "big-data" nature of omics data provides new challenges that require an improvement in the strategies of data analysis and interpretation. In this thesis, two themes are proposed, both aimed at improving the statistical and computational methodology in the field of signatures discovery. The first work aim at identifying serum miRNAs to be used as diagnostic biomarkers associated with ovarian cancer. In particular, a guideline and an ad-hoc microarray normalization strategy for the analysis of circulating miRNAs is proposed. In the second work, a new approach for the identification of functional molecular signatures based on Gaussian graphical models is presented. The model can explore the topological information contained in the biological pathways and highlight the potential sources of differential behaviors in two experimental conditions.
I biomarcatori molecolari, ottenuti attraverso l'utilizzo di piattaforme high-throughput sequencing, costituiscono le basi della medicina personalizzata di nuova generazione. Nonostante un decennio di sforzi e di investimenti, il numero di biomarcatori validi a livello clinico rimane modesto. La natura di "big-data" dei dati omici infatti ha introdotto nuove sfide che richiedono un miglioramento sia degli strumenti di analisi che di quelli di esplorazione dei risultati. In questa tesi vengono proposti due temi centrali, entrambi volti al miglioramento delle metodologie statistiche e computazionali nell'ambito dell'individuazione di firme molecolari. Il primo lavoro si sviluppa attorno all'identificazione di miRNA su siero in pazienti affetti da carcinoma ovarico impiegabili a livello diagnostico. In particolare si propongono delle linee guida per il processo di analisi e una normalizzazione ad-hoc per dati di microarray da utilizzarsi nel contesto di molecole circolanti. Nel secondo lavoro si presenta un nuovo approccio basato sui modelli grafici Gaussiani per l'identificazione di firme molecolari funzionali. Il metodo proposto è in grado di esplorare le informazioni contenute nei pathway biologici e di evidenziare la potenziale origine del comportamento differenziale tra due condizioni sperimentali.
APA, Harvard, Vancouver, ISO, and other styles
18

Cece, Esra Nurten 1984. "Metabolite identification in drug discovery : from data to information and from information to knowledge." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/403648.

Full text
Abstract:
Drug metabolism studies provide the opportunity to enhance metabolic properties of new drugs. The overall aims of the drug metabolism studies are to (1.) optimize pharmacokinetic properties of the drug candidates, (2.) characterize the polymorphic enzyme contribution to clearance, and (3.) support the selection of safe drugs with respect to bioactivation potentials. The ultimate goal in drug metabolism assays in early drug discovery is to translate analytical data to build final knowledge. By contribution of this translation, metabolism scientists can rationalize how structures of new drug compounds could be changed and how metabolic pathways could be better understood. Analytical techniques, such as High Resolution Mass Spectrometry (HRMS), have progressed and now it is possible to generate large datasets through High Throughput Screening (HTS) assays in drug metabolism laboratories. However, the transformation of these data into information and information into knowledge is insufficient. In-depth data inspection is necessary to support the generation of high quality results which are consistent across experiments. For this purpose, innovative software solutions can be utilized to process analytical data. In this respect, by applying standardized and fully automated data evaluation tools, it is possible to (1.) enable comprehensive data analysis, (2.) accelerate structure-based information handling, (3.) eliminate human error and finally (4.) improve chemical features of lead molecules in terms of biotransformation properties. This thesis research aimed to use a novel automated workflow within HRMS to identify the drug metabolites and their structures. Final results confirmed that this new workflow can be used to translate HRMS data into information, which is required for building useful and ultimate knowledge in drug metabolism.
Los estudios de metabolismo de fármacos ofrecen la oportunidad de mejorar las propiedades metabólicas de nuevos fármacos. Los objetivos generales de los estudios de metabolismo de fármacos son: (1.) optimizar las propiedades farmacocinéticas de los fármacos candidatos, (2.) caracterizar la contribución de las enzimas polimórficas a la elimicación y (3.) apoyar la selección de fármacos seguros con respecto a los potenciales bioactivación. El objetivo final en los ensayos de metabolismo de fármacos es traducir los datos analíticos para construir conocimiento final. Mediante la aportación de esta traducción, los científicos que trabajan en metabolismo pueden racionalizar cómo las estructuras de los nuevos compuestos de fármacos podrían ser cambiadas y cómo las vías metabólicas podrían ser mejor entendidaa. Las técnicas analíticas, como la espectrometría de masas de alta resolución (HRMS), han progresado y ahora es posible generar grandes volúmenes de datos a través de ensayos masivos (HTS) en laboratorios de metabolismo de fármacos. Sin embargo, la transformación de estos datos a información y la información a conocimiento es insuficiente. Es necesario un estudio en profundidad de los datos para ayudar a la generación de resultados de alta calidad que sean consistentes con los experimentos. Para este propósito, las soluciones innovadoras de software pueden ser utilizados con el objetivo de procesar los datos analíticos. A este respecto, mediante la aplicación de datos estandarizados y totalmente automatizados herramientas de evaluación, es posible (1.) permitir el análisis de datos completos, (2.) acelerar el manejo de información basada en la estructura, (3.) eliminar el error humano y finalmente (4.) mejorar las características químicas de las moléculas en términos de sus propiedades metabólicas. La investigación de esta tesis tuvo el objetivo que utilizar una novedoso y automatizado “sistema de trabajo” dentro HRMS para identificar los metabolitos de compuestos así como sus estructuras. Los resultados finales se confirmaron que se pueden utilizar herramientas de software para convertir los datos en información de HRMS, necesario para la construcción del conocimiento útil en el metabolismo de fármacos
APA, Harvard, Vancouver, ISO, and other styles
19

Venkatasubramanian, Meenakshi. "De novo Population Discovery from Complex Biological Datasets." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Zhou, Mu. "Knowledge Discovery and Predictive Modeling from Brain Tumor MRIs." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/5809.

Full text
Abstract:
Quantitative cancer imaging is an emerging field that develops computational techniques to acquire a deep understanding of cancer characteristics for cancer diagnosis and clinical decision making. The recent emergence of growing clinical imaging data provides a wealth of opportunity to systematically explore quantitative information to advance cancer diagnosis. Crucial questions arise as to how we can develop specific computational models that are capable of mining meaningful knowledge from a vast quantity of imaging data and how to transform such findings into improved personalized health care? This dissertation presents a set of computational models in the context of malignant brain tumors— Giloblastoma Multiforme (GBM), which is notoriously aggressive with a poor survival rate. In particular, this dissertation developed quantitative feature extraction approaches for tumor diagnosis from magnetic resonance imaging (MRI), including a multi-scale local computational feature and a novel regional habitat quantification analysis of tumors. In addition, we proposed a histogram-based representation to investigate biological features to characterize ecological dynamics, which is of great clinical interest in evaluating tumor cellular distributions. Furthermore, in regards to clinical systems, generic machine learning techniques are typically incapable of generalizing well to specific diagnostic problems. Therefore, quantitative analysis from a data-driven perspective is becoming critical. In this dissertation, we propose two specific data-driven models to tackle different types of clinical MRI data. First, we inspected cancer systems from a time-domain perspective. We propose a quantitative histogram-based approach that builds a prediction model, measuring the differences from pre- and post-treatment diagnostic MRI data. Second, we investigated the problem of mining knowledge from a skewed distribution—data samples of each survival group are unequally distributed. We proposed an algorithmic framework to effectively predict survival groups by jointly considering imbalanced distributions and classifier design. Our approach achieved an accuracy of 95.24%, suggesting it captures class-specific information in a challenging clinical setting.
APA, Harvard, Vancouver, ISO, and other styles
21

Le, Van Quoc Anh [Verfasser], and Michael [Akademischer Betreuer] Gertz. "Pattern Discovery from Event Data / Anh Le Van Quoc ; Betreuer: Michael Gertz." Heidelberg : Universitätsbibliothek Heidelberg, 2014. http://d-nb.info/1180032594/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Kavasidis, Isaak. "Multifaceted analysis for medical data understanding: from data acquisition to multidimensional signal processing to knowledge discovery." Doctoral thesis, Università di Catania, 2016. http://hdl.handle.net/10761/3925.

Full text
Abstract:
Large quantities of medical data are routinely generated each day in the form of text, images and time signals, making evident the need to develop new methodologies not only for the automatization of the processing and management of such data, but also for the deeper un- derstanding of the concepts hidden therein. The main problem that arises is that the acquired data cannot always be in an appropriate state or quality for quantitative analysis, and further processing is often necessary in order to enable automatic processing and manage- ment as well as to increase the accuracy of the results. Also, given the multimodal nature of medical data uniform approaches no longer apply and specific algorithm pipelines should be conceived and devel- oped for each case. In this dissertation we tackle some of the problems that occur in the medical domain regarding different data modalities and an attempt to understand the meaning of these data is made. These problems range from cortical brain signal acquisition and processing to X-Ray image analysis to text and genomics data-mining and subsequent knowledge discovery.
APA, Harvard, Vancouver, ISO, and other styles
23

Radovanovic, Aleksandar. "Concept Based Knowledge Discovery from Biomedical Literature." Thesis, Online access, 2009. http://etd.uwc.ac.za/usrfiles/modules/etd/docs/etd_gen8Srv25Nme4_9861_1272229462.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Elsilä, U. (Ulla). "Knowledge discovery method for deriving conditional probabilities from large datasets." Doctoral thesis, University of Oulu, 2007. http://urn.fi/urn:isbn:9789514286698.

Full text
Abstract:
Abstract In today's world, enormous amounts of data are being collected everyday. Thus, the problems of storing, handling, and utilizing the data are faced constantly. As the human mind itself can no longer interpret the vast datasets, methods for extracting useful and novel information from the data are needed and developed. These methods are collectively called knowledge discovery methods. In this thesis, a novel combination of feature selection and data modeling methods is presented in order to help with this task. This combination includes the methods of basic statistical analysis, linear correlation, self-organizing map, parallel coordinates, and k-means clustering. The presented method can be used, first, to select the most relevant features from even hundreds of them and, then, to model the complex inter-correlations within the selected ones. The capability to handle hundreds of features opens up the possibility to study more extensive processes instead of just looking at smaller parts of them. The results of k-nearest-neighbors study show that the presented feature selection procedure is valid and appropriate. A second advantage of the presented method is the possibility to use thousands of samples. Whereas the current rules of selecting appropriate limits for utilizing the methods are theoretically proved only for small sample sizes, especially in the case of linear correlation, this thesis gives the guidelines for feature selection with thousands of samples. A third positive aspect is the nature of the results: given that the outcome of the method is a set of conditional probabilities, the derived model is highly unrestrictive and rather easy to interpret. In order to test the presented method in practice, it was applied to study two different cases of steel manufacturing with hot strip rolling. In the first case, the conditional probabilities for different types of retentions were derived and, in the second case, the rolling conditions for the occurrence of wedge were revealed. The results of both of these studies show that steel manufacturing processes are indeed very complex and highly dependent on the various stages of the manufacturing. This was further confirmed by the fact that with studies of k-nearest-neighbors and C4.5, it was impossible to derive useful models concerning the datasets as a whole. It is believed that the reason for this lies in the nature of these two methods, meaning that they are unable to grasp such manifold inter-correlations in the data. On the contrary, the presented method of conditional probabilities allowed new knowledge to be gained of the studied processes, which will help to better understand these processes and to enhance them.
APA, Harvard, Vancouver, ISO, and other styles
25

Hahn, Jasper. "From Discovery to Purchase: Improving the User Experience for Buyers in eCommerce." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177473.

Full text
Abstract:
The Internet has revolutionized many areas of our lives. New forms of exchanging and retrieving information, making business and communication in general have been made possible with the Internet and have gone through a rapid development since its creation. In an age of nearly ubiquitous access to the Internet and a majority of the western world actively using social media, retail markets have changed, too. But compared to the rapidly changing services in other sectors, retail businesses have only converted an existing model to a new technology rather than coming up with a new one. Social Commerce is an approach that wants to change that. It takes into account lessons learned from social media and shifting marketing strategies and tries to create a better shopping experience for customers while giving brands and fashion influencers a new platform to engage with them. This thesis project uses literature from different fields such as interaction design, online marketing and fashion along with user interviews to identify the most important aspects that will lead towards a more social online shopping experience, particularly in fashion. It is conducted in collaboration with the local start-up Apprl (www.apprl.com) and includes an implementation part of realizing the identified most promising features as part of the agile development process within the company. The field of social commerce is promising to radically change the way we buy things online and Apprl is one of many examples trying to make that happen.
APA, Harvard, Vancouver, ISO, and other styles
26

Piekenbrock, Matthew J. "Discovering Intrinsic Points of Interest from Spatial Trajectory Data Sources." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1527160689990512.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Deirmenci, Hazim. "Enabling Content Discovery in an IPTV System : Using Data from Online Social Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-200922.

Full text
Abstract:
Internet Protocol television (IPTV) is a way of delivering television over the Internet, which enables two-way communication between an operator and its users. By using IPTV, users have freedom to choose what content they want to consume and when they want to consume it. For example, users are able to watch TV shows after they have been aired on TV, and they can access content that is not part of any linear TV broadcasts, e.g. movies that are available to rent. This means that, by using IPTV, users can get access to more video content than is possible with the traditional TV distribution formats. However, having more options also means that deciding what to watch becomes more difficult, and it is important that IPTV providers facilitate the process of finding interesting content so that the users find value in using their services. In this thesis, the author investigated how a user’s online social network can be used as a basis for facilitating the discovery of interesting movies in an IPTV environment. The study consisted of two parts, a theoretical and a practical. In the theoretical part, a literature study was carried out in order to obtain knowledge about different recommender system strategies. In addition to the literature study, a number of online social network platforms were identified and empirically studied in order to gain knowledge about what data is possible to gather from them, and how the data can be gathered. In the practical part, a prototype content discovery system, which made use of the gathered data, was designed and built. This was done in order to uncover difficulties that exist with implementing such a system. The study shows that, while it is is possible to gather data from different online social networks, not all of them offer data in a form that is easy to make use of in a content discovery system. Out of the investigated online social networks, Facebook was found to offer data that is the easiest to gather and make use of. The biggest obstacle, from a technical point of view, was found to be the matching of movie titles gathered from the online social network with the movie titles in the database of the IPTV service provider; one reason for this is that movies can have titles in different languages.
Internet Protocol television (IPTV) är ett sätt att leverera tv via Internet, vilket möjliggör tvåvägskommunikation mellan en operatör och dess användare. Genom att använda IPTV har användare friheten att välja vilket innehåll de vill konsumera och när de vill konsumera det. Användare har t.ex. möjlighet att titta på tv program efter att de har sänts på tv, och de kan komma åt innehåll som inte är en del av någon linjär tv-sändning, t.ex. filmer som är tillgängliga att hyra. Detta betyder att användare, genom att använda IPTV, kan få tillgång till mer videoinnhåll än vad som är möjligt med traditionella tv-distributionsformat. Att ha fler valmöjligheter innebär dock även att det blir svårare att bestämma sig för vad man ska titta på, och det är viktigt att IPTV-leverantörer underlättar processen att hitta intressant innehåll så att användarna finner värde i att använda deras tjänster. I detta exjobb undersökte författaren hur en användares sociala nätverk på Internet kan användas som grund för att underlätta upptäckandet av intressanta filmer i en IPTV miljö. Undersökningen bestod av två delar, en teoretisk och en praktisk. I den teoretiska delen genomfördes en litteraturstudie för att få kunskap om olika rekommendationssystemsstrategier. Utöver litteraturstudien identifierades ett antal sociala nätverk på Internet som studerades empiriskt för att få kunskap om vilken data som är möjlig att hämta in från dem och hur datan kan inhämtas. I den praktiska delen utformades och byggdes en prototyp av ett s.k. content discovery system (“system för att upptäcka innehåll”), som använde sig av den insamlade datan. Detta gjordes för att exponera svårigheter som finns med att implementera ett sådant system. Studien visar att, även om det är möjligt att samla in data från olika sociala nätverk på Internet så erbjuder inte alla data i en form som är lätt att använda i ett content discovery system. Av de undersökta sociala nätverkstjänsterna visade det sig att Facebook erbjuder data som är lättast att samla in och använda. Det största hindret, ur ett tekniskt perspektiv, visade sig vara matchningen av filmtitlar som inhämtats från den sociala nätverkstjänsten med filmtitlarna i IPTV-leverantörens databas; en anledning till detta är att filmer kan ha titlar på olika språk.
APA, Harvard, Vancouver, ISO, and other styles
28

Bakhtyar, Shoaib. "A Knowledge Graph Approach Towards Hidden Patterns Discovery From Biomedical Publications." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-87267.

Full text
Abstract:
Biomedical research publications oen include significant scientific advancements in the biomedical domain. These publications can be interlinked in dierent aspects, such as having common keywords, covering a similar theme/topic, published by the same author(s), funded by a common funding agency, or belonging to a commonproject. Further, the visibility of links between dierent publications can be significantly useful to researchers, e.g., the keywords that interlink dierent authors can be useful to researchers for finding relevant biomedical researchers with common interests. However, it is diicult to interlink these biomedical publications since their bibliographic information is stored and accessible in fragments in research repositories. Hence, there is a need to investigate on how to interlink biomedical publications to uncover hidden patterns in order to achieve transparency in research. This study, following the design science methodology, investigates a knowledge graph based approach to interlink biomedical publications for uncovering hidden patterns. The study focuses on a use-case of biomedical publications by Örebro University between the years 1973-2021, which are 16626 in total. Biomedical concepts, authors and ailiation details, projects details, keywords, titles, and funding agencies details are extracted and conceptually modelled into a knowledge graph, which is later implemented in a graph database,i.e., Neo4j. Through demonstration of dierent queries on the database, this study finds that the implemented artefact enables greater visibility of links between publications, which in turn leads to visibility of hidden patterns between publications, authors, biomedical entities, funding organization, and projects. Furthermore, the artefact in this study presents information visually to users, which makes the results more transparent and easy to grasp. The artefact can be useful to both researchers and decision makers at an institute or funding agency, e.g., researchers can find other researchers or a potential funding agency based on a common interest, whereas decision makers will be able to get information about authors and funding details that are interlinked.
APA, Harvard, Vancouver, ISO, and other styles
29

Parvinzamir, Farzad. "A visual analytics approach for visualisation and knowledge discovery from time-varying personal life data." Thesis, University of Bedfordshire, 2018. http://hdl.handle.net/10547/622697.

Full text
Abstract:
Today, the importance of big data from lifestyles and work activities has been the focus of much research. At the same time, advances in modern sensor technologies have enabled self-logging of a signi cant number of daily activities and movements. Lifestyle logging produces a wide variety of personal data along the lifespan of individuals, including locations, movements, travel distance, step counts and the like, and can be useful in many areas such as healthcare, personal life management, memory recall, and socialisation. However, the amount of obtainable personal life logging data has enormously increased and stands in need of effective processing, analysis, and visualisation to provide hidden insights owing to the lack of semantic information (particularly in spatiotemporal data), complexity, large volume of trivial records, and absence of effective information visualisation on a large scale. Meanwhile, new technologies such as visual analytics have emerged with great potential in data mining and visualisation to overcome the challenges in handling such data and to support individuals in many aspects of their life. Thus, this thesis contemplates the importance of scalability and conducts a comprehensive investigation into visual analytics and its impact on the process of knowledge discovery from the European Commission project MyHealthAvatar at the Centre for Visualisation and Data Analytics by actively involving individuals in order to establish a credible reasoning and effectual interactive visualisation of such multivariate data with particular focus on lifestyle and personal events. To this end, this work widely reviews the foremost existing work on data mining (with the particular focus on semantic enrichment and ranking), data visualisation (of time-oriented, personal, and spatiotemporal data), and methodical evaluations of such approaches. Subsequently, a novel automated place annotation is introduced with multilevel probabilistic latent semantic analysis to automatically attach relevant information to the collected personal spatiotemporal data with low or no semantic information in order to address the inadequate information, which is essential for the process of knowledge discovery. Correspondingly, a multi-signi ficance event ranking model is introduced by involving a number of factors as well as individuals' preferences, which can influence the result within the process of analysis towards credible and high-quality knowledge discovery. The data mining models are assessed in terms of accurateness and performance. The results showed that both models are highly capable of enriching the raw data and providing significant events based on user preferences. An interactive visualisation is also designed and implemented including a set of novel visual components signifi cantly based upon human perception and attentiveness to visualise the extracted knowledge. Each visual component is evaluated iteratively based on usability and perceptibility in order to enhance the visualisation towards reaching the goal of this thesis. Lastly, three integrated visual analytics tools (platforms) are designed and implemented in order to demonstrate how the data mining models and interactive visualisation can be exploited to support different aspects of personal life, such as lifestyle, life pattern, and memory recall (reminiscence). The result of the evaluation for the three integrated visual analytics tools showed that this visual analytics approach can deliver a remarkable experience in gaining knowledge and supporting the users' life in certain aspects.
APA, Harvard, Vancouver, ISO, and other styles
30

Smith, Tynan S. "Unsupervised discovery of human behavior and dialogue patterns in data from an online game." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/76999.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 121-126).
A content authoring bottleneck in AI, coupled with improving technology, has lead to increasing efforts in using large datasets to power Al systems directly. This idea is being used to create Al agents in video games, using logs of human-played games as the dataset. This new approach to AI brings its own challenges, particularly the need to annotate the datasets used. This thesis explores annotating the behavior in human-played games automatically, namely: how can we generate a list of events, with examples, describing the behavior in thousands of games. First dialogue is clustered semantically to simplify the game logs. Next, sequential pattern mining is used to find action-dialogue sequences that correspond to higher-level events. Finally, these sequences are grouped according to their event. The system can not yet replace human annotation, but the results are promising and can already help to significantly reduce the amount of human effort needed.
by Tynan S. Smith.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
31

Iorio, Francesco. "Automatic discovery of drug mode of action and drug repositioning from gene expression data." Doctoral thesis, Universita degli studi di Salerno, 2011. http://hdl.handle.net/10556/976.

Full text
Abstract:
2009 - 2010
The identification of the molecular pathway that is targeted by a compound, combined with the dissection of the following reactions in the cellular environment, i.e. the drug mode of action, is a key challenge in biomedicine. Elucidation of drug mode of action has been attempted, in the past, with different approaches. Methods based only on transcriptional responses are those requiring the least amount of information and can be quickly applied to new compounds. On the other hand, they have met with limited success and, at the present, a general, robust and efficient gene-expression based method to study drugs in mammalian systems is still missing. We developed an efficient analysis framework to investigate the mode of action of drugs by using gene expression data only. Particularly, by using a large compendium of gene expression profiles following treatments with more than 1,000 compounds on different human cell lines, we were able to extract a synthetic consensual transcriptional response for each of the tested compounds. This was obtained by developing an original rank merging procedure. Then, we designed a novel similarity measure among the transcriptional responses to each drug, endingending up with a “drug similarity network”, where each drug is a node and edges represent significant similarities between drugs. By means of a novel hierarchical clustering algorithm, we then provided this network with a modular topology, contanining groups of highly interconnected nodes (i.e. network communities) whose exemplars form secondlevel modules (i.e. network rich-clubs), and so on. We showed that these topological modules are enriched for a given mode of action and that the hierarchy of the resulting final network reflects the different levels of similarities among the composing compound mode of actions. Most importantly, by integrating a novel drug X into this network (which can be done very quickly) the unknown mode of action can be inferred by studying the topology of the subnetwork surrounding X. Moreover, novel potential therapeutic applications can be assigned to safe and approved drugs, that are already present in the network, by studying their neighborhood (i.e. drug repositioning), hence in a very cheap, easy and fast way, without the need of additional experiments. By using this approach, we were able to correctly classify novel anti-cancer compounds; to predict and experimentally validate an unexpected similarity in the mode of action of CDK2 inhibitors and TopoIsomerase inhibitors and to predict that Fasudil, a known and FDA-approved cardiotonic agent, could be repositioned as novel enhancer of cellular autophagy. Due to the extremely safe profile of this drug and its potential ability to traverse the blood-brain barrier, this could have strong implications in the treatment of several human neurodegenerative disorders, such as Huntington and Parkinson diseases. [edited by author]
IX n.s.
APA, Harvard, Vancouver, ISO, and other styles
32

De, Wilde Max. "From Information Extraction to Knowledge Discovery: Semantic Enrichment of Multilingual Content with Linked Open Data." Doctoral thesis, Universite Libre de Bruxelles, 2015. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/218774.

Full text
Abstract:
Discovering relevant knowledge out of unstructured text in not a trivial task. Search engines relying on full-text indexing of content reach their limits when confronted to poor quality, ambiguity, or multiple languages. Some of these shortcomings can be addressed by information extraction and related natural language processing techniques, but it still falls short of adequate knowledge representation. In this thesis, we defend a generic approach striving to be as language-independent, domain-independent, and content-independent as possible. To reach this goal, we offer to disambiguate terms with their corresponding identifiers in Linked Data knowledge bases, paving the way for full-scale semantic enrichment of textual content. The added value of our approach is illustrated with a comprehensive case study based on a trilingual historical archive, addressing constraints of data quality, multilingualism, and language evolution. A proof-of-concept implementation is also proposed in the form of a Multilingual Entity/Resource Combiner & Knowledge eXtractor (MERCKX), demonstrating to a certain extent the general applicability of our methodology to any language, domain, and type of content.
Découvrir de nouveaux savoirs dans du texte non-structuré n'est pas une tâche aisée. Les moteurs de recherche basés sur l'indexation complète des contenus montrent leur limites quand ils se voient confrontés à des textes de mauvaise qualité, ambigus et/ou multilingues. L'extraction d'information et d'autres techniques issues du traitement automatique des langues permettent de répondre partiellement à cette problématique, mais sans pour autant atteindre l'idéal d'une représentation adéquate de la connaissance. Dans cette thèse, nous défendons une approche générique qui se veut la plus indépendante possible des langues, domaines et types de contenus traités. Pour ce faire, nous proposons de désambiguïser les termes à l'aide d'identifiants issus de bases de connaissances du Web des données, facilitant ainsi l'enrichissement sémantique des contenus. La valeur ajoutée de cette approche est illustrée par une étude de cas basée sur une archive historique trilingue, en mettant un accent particulier sur les contraintes de qualité, de multilinguisme et d'évolution dans le temps. Un prototype d'outil est également développé sous le nom de Multilingual Entity/Resource Combiner & Knowledge eXtractor (MERCKX), démontrant ainsi le caractère généralisable de notre approche, dans un certaine mesure, à n'importe quelle langue, domaine ou type de contenu.
Doctorat en Information et communication
info:eu-repo/semantics/nonPublished
APA, Harvard, Vancouver, ISO, and other styles
33

Wickramarathne, Thanuka Lakmal. "A Belief Theoretic Approach for Automated Collaborative Filtering." Scholarly Repository, 2008. http://scholarlyrepository.miami.edu/oa_theses/182.

Full text
Abstract:
WICKRAMARATHNE, T. L. (M.S., Electrical and Computer Engineering) A Belief Theoretic Approach for Automated Collaborative Filtering (May 2008) Abstract of a thesis at the University of Miami. Thesis supervised by Professor Kamal Premaratne. No. of pages in text. (84) Automated Collaborative Filtering (ACF) is one of the most successful strategies available for recommender systems. Application of ACF in more sensitive and critical applications however has been hampered by the absence of better mechanisms to accommodate imperfections (ambiguities and uncertainties in ratings, missing ratings, etc.) that are inherent in user preference ratings and propagate such imperfections throughout the decision making process. Thus one is compelled to make various "assumptions" regarding the user preferences giving rise to predictions that lack sufficient integrity. With its Dempster-Shafer belief theoretic basis, CoFiDS, the automated Collaborative Filtering algorithm proposed in this thesis, can (a) represent a wide variety of data imperfections; (b) propagate the partial knowledge that such data imperfections generate throughout the decision-making process; and (c) conveniently incorporate contextual information from multiple sources. The "soft" predictions that CoFiDS generates provide substantial exibility to the domain expert. Depending on the associated DS theoretic belief-plausibility measures, the domain expert can either render a "hard" decision or narrow down the possible set of predictions to as smaller set as necessary. With its capability to accommodate data imperfections, CoFiDS widens the applicability of ACF, from the more popular domains, such as movie and book recommendations, to more sensitive and critical problem domains, such as medical expert support systems, homeland security and surveillance, etc. We use a benchmark movie dataset and a synthetic dataset to validate CoFiDS and compare it to several existing ACF systems.
APA, Harvard, Vancouver, ISO, and other styles
34

Tomczak, Jakub. "Algorithms for knowledge discovery using relation identification methods." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2563.

Full text
Abstract:
In this work a coherent survey of problems connected with relational knowledge representation and methods for achieving relational knowledge representation were presented. Proposed approach was shown on three applications: economic case, biomedical case and benchmark dataset. All crucial definitions were formulated and three main methods for relation identification problem were shown. Moreover, for specific relational models and observations’ types different identification methods were presented.
Double Diploma Programme, polish supervisor: prof. Jerzy Świątek, Wrocław University of Technology
APA, Harvard, Vancouver, ISO, and other styles
35

Trávníček, Petr. "Aplikace data miningu v podnikové praxi." Master's thesis, Vysoká škola ekonomická v Praze, 2011. http://www.nusl.cz/ntk/nusl-164048.

Full text
Abstract:
Throughout last decades, knowledge discovery from databases as one of the information and communicaiton technologies' disciplines has developed into its current state being showed increasing interest not only by major business corporates. Presented diploma thesis deals with problematique of data mining while paying prime attention to its practical utilization within business environment. Thesis objective is to review possibilities of data mining applications and to decompose implementation techniques focusing on specific data mining methods and algorithms as well as adaptation of business processes. This objective is subject of theoretical part of thesis focusing on principles of data mining, knowledge discovery from databases process, data mining commonly used methods and algorithms and finally tasks typically implemented in this domain. Further objective consists in presenting data mining benefits on the model example that is being displayed in the practical part of the thesis. Besides created data mining models evalution, practical part contains also design of subsequent steps that would enable higher efficiency in some specific areas of given business. I believe previous point together with characterization of knowledge discovery in databases process to be considered as the most beneficial one's of the thesis.
APA, Harvard, Vancouver, ISO, and other styles
36

Tuovinen, L. (Lauri). "From machine learning to learning with machines:remodeling the knowledge discovery process." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526205243.

Full text
Abstract:
Abstract Knowledge discovery (KD) technology is used to extract knowledge from large quantities of digital data in an automated fashion. The established process model represents the KD process in a linear and technology-centered manner, as a sequence of transformations that refine raw data into more and more abstract and distilled representations. Any actual KD process, however, has aspects that are not adequately covered by this model. In particular, some of the most important actors in the process are not technological but human, and the operations associated with these actors are interactive rather than sequential in nature. This thesis proposes an augmentation of the established model that addresses this neglected dimension of the KD process. The proposed process model is composed of three sub-models: a data model, a workflow model, and an architectural model. Each sub-model views the KD process from a different angle: the data model examines the process from the perspective of different states of data and transformations that convert data from one state to another, the workflow model describes the actors of the process and the interactions between them, and the architectural model guides the design of software for the execution of the process. For each of the sub-models, the thesis first defines a set of requirements, then presents the solution designed to satisfy the requirements, and finally, re-examines the requirements to show how they are accounted for by the solution. The principal contribution of the thesis is a broader perspective on the KD process than what is currently the mainstream view. The augmented KD process model proposed by the thesis makes use of the established model, but expands it by gathering data management and knowledge representation, KD workflow and software architecture under a single unified model. Furthermore, the proposed model considers issues that are usually either overlooked or treated as separate from the KD process, such as the philosophical aspect of KD. The thesis also discusses a number of technical solutions to individual sub-problems of the KD process, including two software frameworks and four case-study applications that serve as concrete implementations and illustrations of several key features of the proposed process model
Tiivistelmä Tiedonlouhintateknologialla etsitään automoidusti tietoa suurista määristä digitaalista dataa. Vakiintunut prosessimalli kuvaa tiedonlouhintaprosessia lineaarisesti ja teknologiakeskeisesti sarjana muunnoksia, jotka jalostavat raakadataa yhä abstraktimpiin ja tiivistetympiin esitysmuotoihin. Todellisissa tiedonlouhintaprosesseissa on kuitenkin aina osa-alueita, joita tällainen malli ei kata riittävän hyvin. Erityisesti on huomattava, että eräät prosessin tärkeimmistä toimijoista ovat ihmisiä, eivät teknologiaa, ja että heidän toimintansa prosessissa on luonteeltaan vuorovaikutteista eikä sarjallista. Tässä väitöskirjassa ehdotetaan vakiintuneen mallin täydentämistä siten, että tämä tiedonlouhintaprosessin laiminlyöty ulottuvuus otetaan huomioon. Ehdotettu prosessimalli koostuu kolmesta osamallista, jotka ovat tietomalli, työnkulkumalli ja arkkitehtuurimalli. Kukin osamalli tarkastelee tiedonlouhintaprosessia eri näkökulmasta: tietomallin näkökulma käsittää tiedon eri olomuodot sekä muunnokset olomuotojen välillä, työnkulkumalli kuvaa prosessin toimijat sekä niiden väliset vuorovaikutukset, ja arkkitehtuurimalli ohjaa prosessin suorittamista tukevien ohjelmistojen suunnittelua. Väitöskirjassa määritellään aluksi kullekin osamallille joukko vaatimuksia, minkä jälkeen esitetään vaatimusten täyttämiseksi suunniteltu ratkaisu. Lopuksi palataan tarkastelemaan vaatimuksia ja osoitetaan, kuinka ne on otettu ratkaisussa huomioon. Väitöskirjan pääasiallinen kontribuutio on se, että se avaa tiedonlouhintaprosessiin valtavirran käsityksiä laajemman tarkastelukulman. Väitöskirjan sisältämä täydennetty prosessimalli hyödyntää vakiintunutta mallia, mutta laajentaa sitä kokoamalla tiedonhallinnan ja tietämyksen esittämisen, tiedon louhinnan työnkulun sekä ohjelmistoarkkitehtuurin osatekijöiksi yhdistettyyn malliin. Lisäksi malli kattaa aiheita, joita tavallisesti ei oteta huomioon tai joiden ei katsota kuuluvan osaksi tiedonlouhintaprosessia; tällaisia ovat esimerkiksi tiedon louhintaan liittyvät filosofiset kysymykset. Väitöskirjassa käsitellään myös kahta ohjelmistokehystä ja neljää tapaustutkimuksena esiteltävää sovellusta, jotka edustavat teknisiä ratkaisuja eräisiin yksittäisiin tiedonlouhintaprosessin osaongelmiin. Kehykset ja sovellukset toteuttavat ja havainnollistavat useita ehdotetun prosessimallin merkittävimpiä ominaisuuksia
APA, Harvard, Vancouver, ISO, and other styles
37

Maini, Vincenzo. "Price and liquidity discovery, jumps and co-jumps using high frequency data from the foreign exchange markets." Thesis, City University London, 2012. http://openaccess.city.ac.uk/2382/.

Full text
Abstract:
The thesis provides a novel contribution to the literature of microstructural theory and discovery models. The main contributions are twofolds. First, we move from price to liquidity discovery and explicitly study the dynamic behavior of a direct measure of liquidity observed from the foreign exchange markets. We extend the framework presented by Hasbrouck (1991) and Dufour and Engle (2000) by allowing the coefficients of both liquidity and trade activity to be time dependent. We find that liquidity time is characterized by a strong stochastic component and that liquidity shocks tend to have temporary effects when transactional time is low or equivalently when trading volatility is high. We then analyze the contribution of liquidity to systemic risk and contagion and, in particular, assess the price impact of liquidity shocks. We extend the approach in Dumitru and Urga (2012) and present a co-jump testing procedure, robust to microstructural noise and spurious detection, and based on a number of combinations of univariate tests for jumps. The proposed test allows us to distinguish between transitory-permanent and endogenous-exogenous co-jumps and determine a causality effect between price and liquidity. In the empirical application, we find evidence of contemporaneous and permanent co-jumps but little signs of exogenous co-jumps between the price and the available liquidity of EUR/USD FX spot during the week from May 3 to May 7, 2010.
APA, Harvard, Vancouver, ISO, and other styles
38

Sundaramurthy, Gopinath. "A Probabilistic Approach for Automated Discovery of Biomarkers using Expression Data from Microarray or RNA-Seq Datasets." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1459528594.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Prichard, Paul Michael. "An investigation into the discovery potential for SUperSYmmetry at the LHC with early data from the ATLAS detector." Thesis, University of Liverpool, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.533931.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Romero, Bustamante Elkin Giovanni. "Introducing the Water Data Explorer Web Application and Python Library: Uniform Means for Data Discovery and Access from CUAHSI and the WMO WHOS Systems." BYU ScholarsArchive, 2021. https://scholarsarchive.byu.edu/etd/8915.

Full text
Abstract:
There has been a growing recognition in recent years of the need for a standardized means for sharing water data on the web. One response to this need was the development of the Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI) Hydrologic Information System (HIS) and its accompanying WaterOneFlow and WaterML protocols. To date, the primary means for accessing data shared using these protocols has been limited to the Microsoft Windows HydroDesktop software, the WaterML R package, and the web based CUAHSI HydroClient which serves as an access point to the CUAHSI HIS database. We recognized the need for a new web-based tool for accessing data from any system that supports WaterOneFlow web services and WaterML and that could be regionally customizable, giving access to the most locally relevant portions of the HIS database, and providing a means for international government agencies, research teams, and others to make use of the accompanying protocols on a locally managed web application. To fill this need, we developed the open source, lightweight, installable web application, Water Data Explorer (WDE) which supports any WaterOneFlow service and can be customized for different regions containing WaterOneFlow web services. The WDE supports data discovery, data visualization, and data download for the selected WaterOneFlow services. The WDE's structure consist of WaterOneFlow catalogs, servers, and individual measurement stations. The WDE provides a different User Interface for administrators and regular users. A server administrator can specify which datasets an individual instance of the WDE supports so that end users of the application can access data from the specified datasets. We modularized the core WaterOneFlow access code into a new open-source Python package called "Pywaterml" which provides the methods used by WDE to discover, visualize, and download data. This thesis presents the design and development of the WDE and the associated Pywaterml package, which was done in partnership with end-users from the WMO and was done in an iterative design-build process. We present two case studies which involve data discovery and visualization from the CUAHSI HIS and WMO Hydrological Observing System (WHOS). Both case studies demonstrate the regional customization of the WDE which allows creation of different custom versions of the same application to meet specific end-user needs. The WDE data discovery in both case studies focuses on discovering the different sites contained in a WaterOneFlow web service, and ontology-based data discovery for the different concept variables in each web service. The data visualization we present, focuses on the time series observation for the different sites in each system. Finally, we tested data downloading in data discovery and visualization by downloading the information of each site to the WDE database and allowing the user to download the time series data.
APA, Harvard, Vancouver, ISO, and other styles
41

Pettersson, Max, and Viktor Jansson. "Predicting rifle shooting accuracy from context and sensor data : A study of how to perform data mining and knowledge discovery in the target shooting domain." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH, Datateknik och informatik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-45396.

Full text
Abstract:
The purpose of this thesis is to develop an interpretable model that gives predictions for what factors impacted a shooter’s results. Experiment is our chosen research method. Our three independent variables are weapon movement, trigger pull force and heart rate. Our dependent variable is shooting accuracy. A random forest regression model is trained with the experiment data to produce predictions of shooting accuracy and to show correlation between independent and dependent variables. Our method shows that an increase in weapon movement, trigger pull force and heart rate decrease the predicted accuracy score. Weapon movement impacted shooting results the most with 53.61%, while trigger pull force and heart rateimpacted shooting results 22.20% and 24.18% respectively. We have also shown that LIME can be a viable method to give explanations on how the measured factors impacted shooting results. The results from this thesis lay the groundwork for better training tools for target shooting using explainable prediction models with sensors.
APA, Harvard, Vancouver, ISO, and other styles
42

Severini, Nicola. "Analysis, Development and Experimentation of a Cognitive Discovery Pipeline for the Generation of Insights from Informal Knowledge." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/21013/.

Full text
Abstract:
The purpose of this thesis project is to bring the application of Cognitive Discovery to an informal type of knowledge. Cognitive Discovery is a term coined by IBM Research to indicate a series of Information Extraction (IE) processes in order to build a knowledge graph capable of representing knowledge from highly unstructured data such as text. Cognitive Discovery is typically applied to a type of formal knowledge, i.e. of the documented text such as academic papers, business reports, patents, etc. While informal knowledge is provided, for example, by recording a conversation within a meeting or through a Power Point presentation, therefore a type of knowledge not formally defined. The idea behind the project is the same as that of the original Cognitive Discovery project, that is the processing of natural language in order to build a knowledge graph that can be interrogated in different ways. This knowledge graph will have an architecture that will depend on the use case, but tends to be a network of entity nodes connected to each other through a certain semantic relationship and to a certain type of nodes containing structural data such as a paragraph, an image or a slide from a presentation. The creation of this graph requires a series of steps, a data processing pipeline that starting from the raw data (in the specific case of the prototype the audio file of the conversation) a series of features are extracted and processed such as entities, semantic relationships between entities, main concepts etc. Once the graph has been created, it is necessary to define an engine for querying and / or generating insights from the knowledge graph; in general the graph database infrastructure also provides a language for querying the graph, however to make the application usable even for those who do not have the technical knowledge necessary to learn the query language, a component has been defined to process the natural language query to query the graph.
APA, Harvard, Vancouver, ISO, and other styles
43

Sengstock, Christian [Verfasser], and Michael [Akademischer Betreuer] Gertz. "Geographic Feature Mining: Framework and Fundamental Tasks for Geographic Knowledge Discovery from User-generated Data / Christian Sengstock ; Betreuer: Michael Gertz." Heidelberg : Universitätsbibliothek Heidelberg, 2015. http://d-nb.info/1180395662/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Nettling, Arthur Martin [Verfasser]. "New approaches for de-novo motif discovery using phylogenetic footprinting : from data acquisition to motif visualization ; [kumulative Dissertation] / Arthur Martin Nettling." Halle, 2017. http://d-nb.info/113307412X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

ABATE, NICODEMO. "Towards an operational use of remote sensing data (satellite, drone, and ground) for Cultural Heritage: from discovery to documentation, monitoring, and valorisation." Doctoral thesis, Università degli studi della Basilicata, 2022. http://hdl.handle.net/11563/158570.

Full text
Abstract:
The research work was developed starting from the idea of an operational and practical use of remote sensing data from satellite, drone, and ground, as suggested by the title. “Towards an operational use of remote sensing data (satellite, drone, and ground) for Cultural Heritage: from discovery o documentation, monitoring and valorisation” was carried out as part of a PhD at the University of Basilicata, but a fundamental contribution was made by the possibility offered by the National Research Council (CNR) - Institute of Methodologies for Environmental Analysis (IMAA), and Institute of Heritage Sciences (ISPC) - to carry out the research as part of their projects. The CNR has been useful and fruitful for the development of new ideas, and access to otherwise inaccessible technologies and tools. Above all, the National Research Council was useful for the people and researchers who provided the author with expertise, advice, experience, and support in times of need. For this reason, the work proposed within the individual chapters follows the same idea and is the result of research carried out in Italy and all around the World, over the last three years. The main aim was to research methodologies, theories and tools useful for real case studies, which could support archaeological research both in its phases of discovery and knowledge, and in the phases of planning and prevention of events damaging the cultural and natural heritage. The activity focused on the predominant use of open source and freely accessible (open) and usable tools and data, where available. In particular, on the use of large databases and powerful calculation platforms made available online free of charge by large service providers such as Google, the European Space Agency, the Italian Space Agency and NASA (National Aeronautics and Space Administration). The choice of structuring methods and workflows based on opensource and open-data was also dictated by the desire to be able to reapply the same methodologies on a global scale to (i) test their robustness, and (ii) provide a reusable and replicable tool. The chapters focus on (i) the use of the ESA (European Space Agency) Copernicus Sentinel-2 and Sentinel-1 satellites, and NASA (National Aeronautics and Space Administration) Landsat-7 TM and Landsat-8 OLI satellites for the discovery of CH and preservation of the CNH; (ii) the use of tools for the management of Big and Open Data; (iii) the use of the new PRISMA (PRecursore IperSpettrale della Missione Applicativa) hyperspectral data of the ASI (Italian Space Agency) for the discovery of new archaeological sites; (iv) the use of close-range technologies such as UASs (Unmanned Aerial System) for the discovery of buried structures; (v) the integrated use of different RS technologies (satellite, UAS, and geophysics) for the discovery and reconstruction of ancient contexts; and, of course, on the use of the related pre- and post-processing methodologies. On the other hand, used methodologies and previous studies on specific topics are set out in more detail in the individual chapters. This choice has been made because it is considered more explanatory and didactic in the context of a global reading of the entire work. The chapters are structured in the form of a paper, some of which have already been published in peer-reviewed journals (e.g. Remote Sensing, IEEE Geoscience and Remote Sensing, etc.). The topics covered are Remote Sensing (RS) and Earth Observation (EO) applied to the discovery, protection, and safeguarding of Cultural and Natural Heritage, with several methodologies and different hardware and software tools.
APA, Harvard, Vancouver, ISO, and other styles
46

Fihn, John, and Johan Finndahl. "A Framework for How to Make Use of an Automatic Passenger Counting System." Thesis, Uppsala universitet, Datorteknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-158139.

Full text
Abstract:
Most of the modern cities are today facing tremendous traffic congestions, which is a consequence of an increasing usage of private motor vehicles in the cities. Public transport plays a crucial role to reduce this traffic, but to be an attractive alternative to the use of private motor vehicles the public transport needs to provide services that suit the citizens requirements for travelling. A system that can provide transit agencies with rapid feedback about the usage of their transport network is the Automatic Passenger Counting (APC) system, a system that registers the number of passengers boarding and alighting a vehicle. Knowledge about the passengers travel behaviour can be used by transit agencies to adapt and improve their services to satisfy the requirements, but to achieve this knowledge transit agencies needs to know how to use an APC system. This thesis investigates how a transit agency can make use of an APC system. The research has taken place in Melbourne where Yarra Trams, operator of the tram network, now are putting effort in how to utilise the APC system. A theoretical framework based on theories about Knowledge Discovery from Data, System Development, and Human Computer Interaction, is built, tested, and evaluated in a case study at Yarra Trams. The case study resulted in a software system that can process and model Yarra Tram's APC data. The result of the research is a proposal of a framework consistingof different steps and events that can be used as a guide for a transit agency that wants to make use of an APC system.
APA, Harvard, Vancouver, ISO, and other styles
47

Beth, Madariaga Daniel Guillermo. "Identificación de las tendencias de reclamos presentes en reclamos.cl y que apunten contra instituciones de educación y organizaciones públicas." Tesis, Universidad de Chile, 2012. http://www.repositorio.uchile.cl/handle/2250/113396.

Full text
Abstract:
Ingeniero Civil Industrial
En la siguiente memoria se busca corroborar, por medio de una experiencia práctica y aplicada, si a caso el uso de las técnicas de Web Opinion Mining (WOM) y de herramientas informáticas, permiten determinar las tendencias generales que pueden poseer un conjunto de opiniones presentes en la Web. Particularmente, los reclamos publicados en el sitio web Reclamos.cl, y que apuntan contra instituciones pertenecientes a las industrias nacionales de Educación y de Gobierno. En ese sentido, los consumidores cada vez están utilizando más la Web para publicar en ella las apreciaciones positivas y negativas que poseen sobre lo que adquieren en el mercado, situación que hace de esta una mina de oro para diversas instituciones, especialmente para lo que es el identificar las fortalezas y las debilidades de los productos y los servicios que ofrecen, su imagen pública, entre varios otros aspectos. Concretamente, el experimento se realiza a través de la confección y la ejecución de una aplicación informática que integra e implementa conceptos de WOM, tales como Knowledge Discovery from Data (KDD), a modo de marco metodológico para alcanzar el objetivo planteado, y Latent Dirichlet Allocation (LDA), para lo que es la detección de tópicos dentro de los contenidos de los reclamos abordados. También se hace uso de programación orientada a objetos, basada en el lenguaje Python, almacenamiento de datos en bases de datos relacionales, y se incorporan herramientas pre fabricadas con tal de simplificar la realización de ciertas tareas requeridas. La ejecución de la aplicación permitió descargar las páginas web en cuyo interior se encontraban los reclamos de interés para la realización experimento, detectando en ellas 6.460 de estos reclamos; los cueles estaban dirigidos hacia 245 instituciones, y cuya fecha de publicación fue entre el 13 de Julio de 2006 y el 5 de Diciembre de 2011. Así también, la aplicación, mediante el uso de listas de palabras a descartar y de herramientas de lematización, procesó los contenidos de los reclamos, dejando en ellos sólo las versiones canónicas de las palabras que los constituían y que aportasen significado a estos. Con ello, la aplicación llevó a cabo varios análisis LDA sobre estos contenidos, los que arbitrariamente se definieron para ser ejecutados por cada institución detectada, tanto sobre el conjunto total de sus reclamos, como en segmentos de estos agrupados por año de publicación, con tal de generar, por cada uno de estos análisis, resultados compuestos por 20 tópicos de 30 palabras cada uno. Con los resultados de los análisis LDA, y mediante una metodología de lectura e interpretación manual de las palabras que constituían cada uno de los conjuntos de tópicos obtenidos, se procedió a generar frases y oraciones que apuntasen a hilarlas, con tal de obtener una interpretación que reflejase la tendencia a la cual los reclamos, representados en estos resultados, apuntaban. De esto se pudo concluir que es posible detectar las tendencias generales de los reclamos mediante el uso de las técnicas de WOM, pero con observaciones al respecto, pues al surgir la determinación de las tendencias desde un proceso de interpretación manual, se pueden generar subjetividades en torno al objeto al que apuntan dichas tendencias, ya sea por los intereses, las experiencias, entre otros, que posea la persona que realice el ejercicio de interpretación de los resultados.
APA, Harvard, Vancouver, ISO, and other styles
48

Lima, Junior José. "Descoberta de equivalência semântica entre atributos em bancos de dados utilizando redes neurais." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2004. http://hdl.handle.net/10183/12012.

Full text
Abstract:
Com o crescimento das empresas que fazem uso das tecnologias de bancos de dados, os administradores destes bancos de dados criam novos esquemas a cada instante, e na maioria dos casos não existe uma normalização ou procedimentos formais para que tal tarefa seja desempenhada de forma homogênea, resultando assim em bases de dados incompatíveis, o que dificulta a troca de dados entre as mesmas. Quando os Sistemas de Bancos de Dados (SBD) são projetados e implementados independentemente, é normal que existam incompatibilidades entre os dados de diferentes SBD. Como principais conflitos existentes nos esquemas de SBD, podem ser citados problemas relacionados aos nomes dos atributos, armazenamento em diferentes unidades de medida, diferentes níveis de detalhes, atributos diferentes com mesmo nome ou atributos iguais com nomes diferentes, tipos de dado diferentes, tamanho, precisão, etc. Estes problemas comprometem a qualidade da informação e geram maiores custos em relação à manutenção dos dados. Estes problemas são conseqüências de atributos especificados de forma redundante. Estes fatos têm provocado grande interesse em descobrir conhecimento em banco de dados para identificar informações semanticamente equivalentes armazenadas nos esquemas. O processo capaz de descobrir este conhecimento em banco de dados denomina-se DCDB (Descoberta de Conhecimento em Bancos de Dados). As ferramentas disponíveis para a execução das tarefas de DCDB são genéricas e derivadas de outras áreas do conhecimento, em especial, da estatística e inteligência artificial. As redes neurais artificiais (RNA) têm sido utilizadas em sistemas cujo propósito é a identificação de padrões, antes desconhecidos. Estas redes podem aprender similaridades entre os dados, diretamente de suas instâncias, sem conhecimento a priori. Uma RNA que tem sido usada com êxito para identificar equivalência semântica é o Mapa Auto-Organizável (SOM). Esta pesquisa objetiva descobrir, de modo semi-automatizado, equivalência semântica entre atributos de bases de dados, contribuindo para o gerenciamento e integração das mesmas. O resultado da pesquisa gerou uma sistemática para o processo de descoberta e uma ferramenta que a implementa.
With the increasing number of companies using database technologies, the database’s administrators create new schemes at every moment, and in most cases there are no normalization or formal procedures to do this task in a homogeneous form, it results in incompatible databases, that difficult data exchange. When the Database Systems (DBS) are projected and implemented independently, it is normal that data incompatibilities among different DBS. Problems related to the names of the attributes, storage in different measurement units, different levels of detail, different attributes with the same name or equal attributes with different names, different type of data, size, precision, etc, can be cited as main conflicts existing in the DBS schemes. These problems compromise the quality information and generate higher costs regarding the data maintenance. These problems arise as the consequence of redundant attributes’ specification. These facts have caused great interest in discovering knowledge in database to identify information semantically equivalent stored in schemes. The process capable to discover this knowledge in database is called KDD (Knowledge Discovery in Database). The available tools to do KDD tasks are generic and derived from other areas of knowledge, in special, statistics and artificial intelligence. The artificial neural networks (ANN) have been used in systems which aim is the identification of previously unknown patterns. These networks can learn similarities among the data directly from instances, without a priori knowledge. An ANN that has been used with success to identify semantic equivalence is the Self-Organizing Map (SOM). This research aims to discover, in a semi-automatic way, semantic equivalence on database attributes, contributing for the management and integration of these databases. This work resulted in a systematic for the discovery process and a tool that implements it.
APA, Harvard, Vancouver, ISO, and other styles
49

Kasík, Josef. "Empirické porovnání volně dostupných systémů dobývání znalostí z databází." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-10731.

Full text
Abstract:
Both topic and main objective of the diploma thesis is a comparison of free data mining suites. Subjects of comparison are six particular applications developed under university projects as experimental tools for data mining and mediums for educational purposes. Criteria of the comparison are derived from four general aspects that form the base for further analyses. Each system is evaluated as a tool for handling real-time data mining tasks, a tool supporting various phases of the CRISP-DM methodology, a tool capable of practical employment on certain data and as a common software system. These aspects bring 31 particular criteria for comparison, evaluation of whose was determined by thorough analysis of each system. The results of comparison confirmed the anticipated assumption. As the best tool the Weka data mining suite was evaluated. The main advantages of Weka are high number of machine learning algorithms, numerous data preparation tools and speed of processing.
APA, Harvard, Vancouver, ISO, and other styles
50

Keedwell, Edward. "Knowledge discovery from gene expression data using neural-genetic models : a comparative study of four European countries with special attention to the education of these children." Thesis, University of Exeter, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.288704.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography