To see the other types of publications on this topic, follow the link: KNOWLEDGE DISCOVERY BASED TECHNIQUE.

Dissertations / Theses on the topic 'KNOWLEDGE DISCOVERY BASED TECHNIQUE'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'KNOWLEDGE DISCOVERY BASED TECHNIQUE.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Mohd, Saudi Madihah. "A new model for worm detection and response : development and evaluation of a new model based on knowledge discovery and data mining techniques to detect and respond to worm infection by integrating incident response, security metrics and apoptosis." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5410.

Full text
Abstract:
Worms have been improved and a range of sophisticated techniques have been integrated, which make the detection and response processes much harder and longer than in the past. Therefore, in this thesis, a STAKCERT (Starter Kit for Computer Emergency Response Team) model is built to detect worms attack in order to respond to worms more efficiently. The novelty and the strengths of the STAKCERT model lies in the method implemented which consists of STAKCERT KDD processes and the development of STAKCERT worm classification, STAKCERT relational model and STAKCERT worm apoptosis algorithm. The new concept introduced in this model which is named apoptosis, is borrowed from the human immunology system has been mapped in terms of a security perspective. Furthermore, the encouraging results achieved by this research are validated by applying the security metrics for assigning the weight and severity values to trigger the apoptosis. In order to optimise the performance result, the standard operating procedures (SOP) for worm incident response which involve static and dynamic analyses, the knowledge discovery techniques (KDD) in modeling the STAKCERT model and the data mining algorithms were used. This STAKCERT model has produced encouraging results and outperformed comparative existing work for worm detection. It produces an overall accuracy rate of 98.75% with 0.2% for false positive rate and 1.45% is false negative rate. Worm response has resulted in an accuracy rate of 98.08% which later can be used by other researchers as a comparison with their works in future.
APA, Harvard, Vancouver, ISO, and other styles
2

Radovanovic, Aleksandar. "Concept Based Knowledge Discovery from Biomedical Literature." Thesis, Online access, 2009. http://etd.uwc.ac.za/usrfiles/modules/etd/docs/etd_gen8Srv25Nme4_9861_1272229462.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Aamot, Elias. "Literature-based knowledge discovery in climate science." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-27047.

Full text
Abstract:
Climate change caused by anthropogenic activity is one of the biggest challenges of our time. Researchers are striving to understand the effects of global warming on the ecological systems of the oceans, and how these ecological systems influence the global climate, a line of research that is crucial in order to counteract or adapt to the effects of global warming. A major challenge that researchers in this area are facing, is the huge amount of potentially relevant literature, as insights from widely different fields such as biology, chemistry, climatology and oceanography can prove crucial in understanding the effects of global warming on the oceans. To alleviate some of the work load from researchers, information extraction tools can be used to extract relevant information from the scientific literature automatically, and discovery support tools can be developed to assist researchers in their efforts. This master thesis conducts fundamental research into the development of discovery support tools for oceanographic climate science, focusing primarily on the information extraction component.
APA, Harvard, Vancouver, ISO, and other styles
4

Shelke, Yuri Rajendra. "Knowledge Based Topology Discovery and Geo-localization." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1276877783.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yildiz, Meliha Yetisgen. "Using statistical and knowledge-based approaches for literature-based discovery /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/7178.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Vermilyer, Robert. "Knowledge Discovery in Content-Based Image Retrieval Systems." NSUWorks, 2005. http://nsuworks.nova.edu/gscis_etd/898.

Full text
Abstract:
The advent of the World Wide Web and digital photography has led to a phenomenal increase in the number and complexity of stored images. Accordingly, the ability to browse and retrieve images based upon image content is of rapidly growing importance. The goals of this research project are to develop a Content-Based Image Retrieval (CBIR) system that combines dynamic, user-driven search capabilities with artificial intelligence techniques and to examine the system's effectiveness. The experimental method will be used to test the specific hypotheses and various research questions proposed in this research project. All of the experiments will be conducted using a CBIR prototype system that incorporates intelligent User Interface Agents (UIA). The UlAs will use both neural networks and an expert reasoning system. The actual experiments will be conducted using a task-oriented approach, with both descriptive and analytical statistics used to assess the results. In addition, a new evaluation CBIR metric will be proposed and applied. It is expected that this research will benefit CBIR research and CBIR system development by: 1) demonstrating the effectiveness of providing users with an interface that allows them to sketch an image, provides a relevance feedback mechanism that is based on providing similar images, and offers query refinement suggestions; 2) presenting a reusable modular design approach that can be used to create CBIR systems; 3) showing how AI techniques, particularly intelligent User Interface Agents, can be used effectively in CBIR systems; 4) proposing a "standard" CBIR user interface; and 5) proposing a new CBIR evaluation metric. The results of this research project should advance the current state of CBIR in that it designs, implements and evaluates an interactive CBIR system that uses image input and incorporates both the user's interactive guidance and artificial intelligence techniques to access images.
APA, Harvard, Vancouver, ISO, and other styles
7

Ajala, Adebunmi Elizabeth. "Acquiring and filtering knowledge : discovery & case-based reasoning." Thesis, University of Surrey, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.433304.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Phan, John H. "Biomarker discovery and clinical outcome prediction using knowledge based-bioinformatics." Diss., Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/33855.

Full text
Abstract:
Advances in high-throughput genomic and proteomic technology have led to a growing interest in cancer biomarkers. These biomarkers can potentially improve the accuracy of cancer subtype prediction and subsequently, the success of therapy. However, identification of statistically and biologically relevant biomarkers from high-throughput data can be unreliable due to the nature of the data--e.g., high technical variability, small sample size, and high dimension size. Due to the lack of available training samples, data-driven machine learning methods are often insufficient without the support of knowledge-based algorithms. We research and investigate the benefits of using knowledge-based algorithms to solve clinical prediction problems. Because we are interested in identifying biomarkers that are also feasible in clinical prediction models, we focus on two analytical components: feature selection and predictive model selection. In addition to data variance, we must also consider the variance of analytical methods. There are many existing feature selection algorithms, each of which may produce different results. Moreover, it is not trivial to identify model parameters that maximize the sensitivity and specificity of clinical prediction. Thus, we introduce a method that uses independently validated biological knowledge to reduce the space of relevant feature selection algorithms and to improve the reliability of clinical predictors. Finally, we implement several functions of this knowledge-based method as a web-based, user-friendly, and standards-compatible software application.
APA, Harvard, Vancouver, ISO, and other styles
9

Yu, Zhiguo. "Cooperative Semantic Information Processing for Literature-Based Biomedical Knowledge Discovery." UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/33.

Full text
Abstract:
Given that data is increasing exponentially everyday, extracting and understanding the information, themes and relationships from large collections of documents is more and more important to researchers in many areas. In this paper, we present a cooperative semantic information processing system to help biomedical researchers understand and discover knowledge in large numbers of titles and abstracts from PubMed query results. Our system is based on a prevalent technique, topic modeling, which is an unsupervised machine learning approach for discovering the set of semantic themes in a large set of documents. In addition, we apply a natural language processing technique to transform the “bag-of-words” assumption of topic models to the “bag-of-important-phrases” assumption and build an interactive visualization tool using a modified, open-source, Topic Browser. In the end, we conduct two experiments to evaluate the approach. The first, evaluates whether the “bag-of-important-phrases” approach is better at identifying semantic themes than the standard “bag-of-words” approach. This is an empirical study in which human subjects evaluate the quality of the resulting topics using a standard “word intrusion test” to determine whether subjects can identify a word (or phrase) that does not belong in the topic. The second is a qualitative empirical study to evaluate how well the system helps biomedical researchers explore a set of documents to discover previously hidden semantic themes and connections. The methodology for this study has been successfully used to evaluate other knowledge-discovery tools in biomedicine.
APA, Harvard, Vancouver, ISO, and other styles
10

Siochi, Fernando C. "Building a knowledge based simulation optimization system with discovery learning." Diss., This resource online, 1995. http://scholar.lib.vt.edu/theses/available/etd-06062008-155425/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Engels, Robert. "Component based user guidance in knowledge discovery and data mining /." Sankt Augustin : Infix, 1999. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=008752552&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Emami, Leila. "Conceptual Browser, a concept-based knowledge extraction technique." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0001/MQ43162.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Wang, Keqin. "Knowledge discovery in manufacturing quality data to support product design decision making." Troyes, 2010. http://www.theses.fr/2010TROY0005.

Full text
Abstract:
La conception des produits implique de grandes quantités de décisions (MQD). Le soutien pertinent et efficace des connaissances est important pour les décisions. La plupart des travaux ont été réalisées sur les connaissances de conception comme support à la conception. Cependant la connaissance de fabrication sur la qualité des produits est également une caractéristique qui n'est pas jugé suffisant. Entre-temps, de grands volumes de données de fabrication sont générés et enregistrés. Des connaissances nécessaires à la production sont implicites dans ces données. Les travaux présentés dans cette thèse se concentre sur l'extraction de connaissances de fabrication de qualité dans ces données en utilisant des méthodes d'exploration de données et de retour d’expérience utiles pour les concepteurs de produits (une ontologie regroupant les éléments importants à la prise de décision a été définie). Des techniques de Data Mining sont ensuite exploitées afin de répondre aux problèmes de qualité de la connaissance en production. Un prototype support à la prise de décision en conception de produits a été défini. Il considère les critères de qualité dans l’extraction et la recherche des connaissances
This work studies knowledge extraction in manufacturing quality data (MQD) for support-ing design decisions. Firstly, an ontological approach for analyzing design decisions and identifying designer’s needs for manufacturing quality knowledge is proposed. The decisions are analyzed ranging from task clarification, conceptual design, embodiment design to detail design. A decision model is proposed in which decisions and its knowledge elements are illustrated. An ontology is constructed to represent the decisions and their knowledge needs. Secondly, MQD preparation for further knowledge discovery is described. The nature of data in manufacturing is described. A GT (group technology) and QBOM (Quality Bill of Material)-based method is proposed to classify and organize MQD. As an important factor, the data quality (DQ) issues related with MQD is also analyzed for data mining (DM) application. A QFD (quality function deployment) based approach is proposed for translating data consumers’ DQ needs into specific DQ dimensions and initiatives. Thirdly, a DM-based manufacturing quality knowledge discovery method is proposed and validated through two popular DM functions and related algorithms. The two DM functions are illustrated through real world data sets from two different production lines. Fourthly, a MQD-based design support proto-type is developed. The prototype includes three major functions such as data input, knowledge extraction and input, knowledge search
APA, Harvard, Vancouver, ISO, and other styles
14

Ni, Weizeng. "Ontology-based Feature Construction on Non-structured Data." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1439309340.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Neznanov, Alexey A., Dmitry A. Ilvovsky, and Sergei O. Kuznetsov. "FCART: A New FCA-based System for Data Analysis and Knowledge Discovery." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-113161.

Full text
Abstract:
We introduce a new software system called Formal Concept Analysis Research Toolbox (FCART). Our goal is to create a universal integrated environment for knowledge and data engineers. FCART is constructed upon an iterative data analysis methodology and provides a built-in set of research tools based on Formal Concept Analysis techniques for working with object-attribute data representations. The provided toolset allows for the fast integration of extensions on several levels: from internal scripts to plugins. FCART was successfully applied in several data mining and knowledge discovery tasks. Examples of applying the system in medicine and criminal investigations are considered.
APA, Harvard, Vancouver, ISO, and other styles
16

Jones, David. "Improving engineering information access and knowledge discovery through model-based information navigation." Thesis, University of Bristol, 2019. http://hdl.handle.net/1983/2d1c1535-e582-41fd-a6f6-cc1178c21d2a.

Full text
Abstract:
An organisation's data, information, and knowledge is widely considered to be one of its greatest assets. As such, the capture, storage and dissemination of this asset is the focus of both academic and organisational efforts. This is true at the Airbus Group, the industrial partner of this thesis. Their Knowledge Management team invests in state-of-the-art tools and techniques, and actively participates in research in a bid to maximise their organisation's reuse of knowledge and ultimately their competitiveness. A successful knowledge management strategy creates a knowledgeable and wise workforce that ultimately benefits both the individual and the organisation. The dissemination of information and knowledge such that it is easily and readily accessible is one key aspect within such a strategy. Search engines are a typical means for information and knowledge dissemination yet, unlike the Internet, search within organisations (intranet or enterprise search) is frequently found lacking. This thesis contributes to this area of knowledge management. Research in the field of enterprise search has been shown to improve search through the application of context to expand search queries. The novel approach taken in this thesis takes this context and applies it visually, moving the search for information away from a text-based user interface towards a user interface that reflects the function and form of the product. The approach: model-based information navigation, is based on the premise that leveraging the visual and functional nature of engineers through a model-based user interface can improve information access and knowledge discovery. From the perspectives of information visualisation, engineering information management, product life-cycle management, and building information modelling, this thesis contributes through: The development of techniques that enable documents to be indexed against the product structure; The development of techniques for navigation within engineering three-dimensional virtual environments; The design of a range visual information object for the display of information within engineering three-dimensional virtual environments; The determination of the affordance of a model-based approach to information navigation. This thesis presents the development of a framework for model-based information navigation: a novel approach to finding information that places a three-dimensional representation of the product at the heart of searching document collections.
APA, Harvard, Vancouver, ISO, and other styles
17

Al, Harbi H. Y. M. "Semantically aware hierarchical Bayesian network model for knowledge discovery in data : an ontology-based framework." Thesis, University of Salford, 2017. http://usir.salford.ac.uk/43293/.

Full text
Abstract:
Several mining algorithms have been invented over the course of recent decades. However, many of the invented algorithms are confined to generating frequent patterns and do not illustrate how to act upon them. Hence, many researchers have argued that existing mining algorithms have some limitations with respect to performance and workability. Quantity and quality are the main limitations of the existing mining algorithms. While quantity states that the generated patterns are abundant, quality indicates that they cannot be integrated into the business domain seamlessly. Consequently, recent research has suggested that the limitations of the existing mining algorithms are the result of treating the mining process as an isolated and autonomous data-driven trial-and-error process and ignoring the domain knowledge. Accordingly, the integration of domain knowledge into the mining process has become the goal of recent data mining algorithms. Domain knowledge can be represented using various techniques. However, recent research has stated that ontology is the natural way to represent knowledge for data mining use. The structural nature of ontology makes it a very strong candidate for integrating domain knowledge with data mining algorithms. It has been claimed that ontology can play the following roles in the data mining process: • Bridging the semantic gap. • Providing prior knowledge and constraints. • Formally representing the DM results. Despite the fact that a variety of research has used ontology to enrich different tasks in the data mining process, recent research has revealed that the process of developing a framework that systematically consolidates ontology and the mining algorithms in an intelligent mining environment has not been realised. Hence, this thesis proposes an automatic, systematic and flexible framework that integrates the Hierarchical Bayesian Network (HBN) and domain ontology. The ultimate aim of this thesis is to propose a data mining framework that implicitly caters for the underpinning domain knowledge and eventually leads to a more intelligent and accurate mining process. To a certain extent the proposed mining model will simulate the cognitive system in the human being. The similarity between ontology, the Bayesian Network (BN) and bioinformatics applications establishes a strong connection between these research disciplines. This similarity can be summarised in the following points: • Both ontology and BN have a graphical-based structure. • Biomedical applications are known for their uncertainty. Likewise, BN is a powerful tool for reasoning under uncertainty. • The medical data involved in biomedical applications is comprehensive and ontology is the right model for representing comprehensive data. Hence, the proposed ontology-based Semantically Aware Hierarchical Bayesian Network (SAHBN) is applied to eight biomedical data sets in the field of predicting the effect of the DNA repair gene in the human ageing process and the identification of hub protein. Consequently, the performance of SAHBN was compared with existing Bayesian-based classification algorithms. Overall, SAHBN demonstrated a very competitive performance. The contribution of this thesis can be summarised in the following points. • Proposed an automatic, systematic and flexible framework to integrate ontology and the HBN. Based on the literature review, and to the best of our knowledge, no such framework has been proposed previously. • The complexity of learning HBN structure from observed data is significant. Hence, the proposed SAHBN model utilized the domain knowledge in the form of ontology to overcome this challenge. • The proposed SAHBN model preserves the advantages of both ontology and Bayesian theory. It integrates the concept of Bayesian uncertainty with the deterministic nature of ontology without extending ontology structure and adding probability-specific properties that violate the ontology standard structure. • The proposed SAHBN utilized the domain knowledge in the form of ontology to define the semantic relationships between the attributes involved in the mining process, guides the HBN structure construction procedure, checks the consistency of the training data set and facilitates the calculation of the associated conditional probability tables (CPTs). • The proposed SAHBN model lay out a solid foundation to integrate other semantic relations such as equivalent, disjoint, intersection and union.
APA, Harvard, Vancouver, ISO, and other styles
18

Zhu, Cheng. "Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378215769.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Li, Xin. "Graph-based learning for information systems." Diss., The University of Arizona, 2009. http://hdl.handle.net/10150/193827.

Full text
Abstract:
The advance of information technologies (IT) makes it possible to collect a massive amount of data in business applications and information systems. The increasing data volumes require more effective knowledge discovery techniques to make the best use of the data. This dissertation focuses on knowledge discovery on graph-structured data, i.e., graph-based learning. Graph-structured data refers to data instances with relational information indicating their interactions in this study. Graph-structured data exist in a variety of application areas related to information systems, such as business intelligence, knowledge management, e-commerce, medical informatics, etc. Developing knowledge discovery techniques on graph-structured data is critical to decision making and the reuse of knowledge in business applications.In this dissertation, I propose a graph-based learning framework and identify four major knowledge discovery tasks using graph-structured data: topology description, node classification, link prediction, and community detection. I present a series of studies to illustrate the knowledge discovery tasks and propose solutions for these example applications. As to the topology description task, in Chapter 2 I examine the global characteristics of relations extracted from documents. Such relations are extracted using different information processing techniques and aggregated to different analytical unit levels. As to the node classification task, Chapter 3 and Chapter 4 study the patent classification problem and the gene function prediction problem, respectively. In Chapter 3, I model knowledge diffusion and evolution with patent citation networks for patent classification. In Chapter 4, I extend the context assumption in previous research and model context graphs in gene interaction networks for gene function prediction. As to the link prediction task, Chapter 5 presents an example application in recommendation systems. I frame the recommendation problem as link prediction on user-item interaction graphs, and propose capturing graph-related features to tackle this problem. Chapter 6 examines the community detection task in the context of online interactions. In this study, I propose to take advantage of the sentiments (agreements and disagreements) expressed in users' interactions to improve community detection effectiveness. All these examples show that the graph representation allows the graph structure and node/link information to be more effectively utilized in addressing the four knowledge discovery tasks.In general, the graph-based learning framework contributes to the domain of information systems by categorizing related knowledge discovery tasks, promoting the further use of the graph representation, and suggesting approaches for knowledge discovery on graph-structured data. In practice, the proposed graph-based learning framework can be used to develop a variety of IT artifacts that address critical problems in business applications.
APA, Harvard, Vancouver, ISO, and other styles
20

Jia, Tao. "Geospatial Knowledge Discovery using Volunteered Geographic Information : a Complex System Perspective." Doctoral thesis, KTH, Geodesi och geoinformatik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-104783.

Full text
Abstract:
The continuous progression of urbanization has resulted in an increasing number of people living in cities or towns. In parallel, advancements in technologies, such as the Internet, telecommunications, and transportation, have allowed for better connectivity among people. This has engendered drastic changes in urban systems during the recent decades. From a social geographic perspective, the changes in urban systems are primarily characterized by intensive contacts among people and their interactions with the surrounding urban environment, which further leads to subsequent challenging problems such as traffic jams, environmental pollution, urban sprawl, etc. These problems have been reported to be heterogeneous and non-deterministic. Hence, to cope with them, massive amounts of geographic data are required to create new knowledge on urban systems. Due to the thriving of Volunteer Geographic Information (VGI) in recent years, this thesis presents knowledge on urban systems based on extensive VGI datasets from three sources: highway dataset from the OpenStreetMap (OSM) project, photo location dataset from the Flickr website, and GPS tracking datasets from volunteers, taxicabs, and air flights. The knowledge primarily relates to two issues of urban systems: the urban space and the corresponding human dynamics. In accordance, on one hand, urban space acts as a carrier for associated geographic activities and knowledge of it benefits our understanding of current social and economic problems in urban systems. On the other hand, human dynamics reflect human behavior in urban space, which leads to complex mobility or activity patterns. Its investigation allows a derivation of the underlying driving force that is very instructive to urban planning, traffic management, and infectious disease control. Therefore, to fully understand the two issues, this thesis conducts a thorough investigation from multiple aspects. The first issue is investigated from four aspects. First, at the city level, the controversial topic of city size regularity is investigated in terms of natural cities, and the conclusion is that Zipf’s law holds stably for all US cities. Second, at the sub-city level, the size distribution of spatial units within different cities in terms of the clusters formed by street nodes, photo locations, and taxi static points are explored, and the result shows a remarkable scaling property of these spatial units. Third, enlightened by the scaling property of the urban space at the city or sub-city level, this thesis devises a novel tool that can demarcate the cities into three categories: compact cities, normal cities, and sprawling cities. The tool is then applied to cities in both the US and three European countries. In the last, another representation of urban space is taken into account, namely the transportation network. The findings report that the US airport network displays the properties of scale-free, small-world, and disassortative mixing and that the individual natural airports show heterogeneous patterns that are probably subject to geographic constraints and socioeconomic factors. The second issue is examined from four perspectives. First, at the city level, the movement flow contributed by agents using two types of behavior is investigated through an agent-based simulation, and the result conjectures that the human mobility behavior is mainly shaped by the underlying street network. Second, at the country level, this thesis reports that the human travel length by air can be approximated well by an exponential distribution, and subsequent simulations indicate that human mobility behavior is largely constrained by the underlying airport network. Third, at the regional level, the length that humans travel by car is demonstrated to agree well with a power law with exponential cutoff distribution, and subsequent simulation further reproduces this levy flight characteristic. Based on the simulation, human mobility behavior is again revealed to be primarily shaped by the underlying hierarchical spatial structure. Finally, taxicab static points are adopted to explore human activity patterns, which can be characterized as the regularities in space and time, the heterogeneity and predictability in space. From a complex system perspective, this thesis presents the knowledge discovered in urban systems using massive volumes of geographic data. Together with new knowledge from empirical findings, the development of methods, and the design of theoretic models, this thesis also shares the research community with geographic data generated from extensive VGI datasets and the corresponding source codes. Moreover, this study is aligned with a paradigm shift in that it analyzes large-size datasets using high processing power as opposed to analyzing small-size datasets with low processing power.

QC 20121113

APA, Harvard, Vancouver, ISO, and other styles
21

Chuddher, Bilal Akbar. "A novel knowledge discovery based approach for supplier risk scoring with application in the HVAC industry." Thesis, Brunel University, 2015. http://bura.brunel.ac.uk/handle/2438/11628.

Full text
Abstract:
This research has led to a novel methodology for assessment and quantification of supply risks in the supply chain. The research has built on advanced Knowledge Discovery techniques and has resulted to a software implementation to be able to do so. The methodology developed and presented here resembles the well-known consumer credit scoring methods as it leads to a similar metric, or score, for assessing a supplier’s reliability and risk of conducting business with that supplier. However, the focus is on a wide range of operational metrics rather than just financial, which credit scoring techniques typically focus on. The core of the methodology comprises the application of Knowledge Discovery techniques to extract the likelihood of possible risks from within a range of available datasets. In combination with cross-impact analysis, those datasets are examined for establish the inter-relationships and mutual connections among several factors that are likely contribute to risks associated with particular suppliers. This approach is called conjugation analysis. The resulting parameters become the inputs into a logistic regression which leads to a risk scoring model the outcome of the process is the standardized risk score which is analogous to the well-known consumer risk scoring model, better known as FICO score. The proposed methodology has been applied to an Air Conditioning manufacturing company. Two models have been developed. The first identifies the supply risks based on the data about purchase orders and selected risk factors. With this model the likelihoods of delivery failures, quality failures and cost failures are obtained. The second model built on the first one but also used the actual data about the performance of supplier to identify risks of conducting business with particular suppliers. Its target was to provide quantitative measures of an individual supplier’s risk level. The supplier risk scoring model is tested on the data acquired from the company for its performance analysis. The supplier risk scoring model achieved 86.2% accuracy, while the area under curve (AUC) was 0.863. The AUC curve is much higher than required model’s validity threshold value of 0.5. It represents developed model’s validity and reliability for future data. The numerical studies conducted with real-life datasets have demonstrated the effectiveness of the proposed methodology and system as well as its future potential for industrial adoption.
APA, Harvard, Vancouver, ISO, and other styles
22

Zhao, Wei. "Feature-Based Hierarchical Knowledge Engineering for Aircraft Life Cycle Design Decision Support." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/14639.

Full text
Abstract:
The design process of aerospace systems is becoming more and more complex. As the process is progressively becoming enterprise-wide, it involves multiple vendors and encompasses the entire life-cycle of the system, as well as a system-of-systems perspective. The amount of data and information generated under this paradigm has increased exponentially creating a difficult situation as it pertains to data storage, management, and retrieval. Furthermore, the data themselves are not suitable or adequate for use in most cases and must be translated into knowledge with a proper level of abstraction. Adding to the problem is the fact that the knowledge discovery process needed to support the growth of data in aerospace systems design has not been developed to the appropriate level. In fact, important design decisions are often made without sufficient understanding of their overall impact on the aircraft's life, because the data have not been efficiently converted and interpreted in time to support design. In order to make the design process adapt to the life-cycle centric requirement, this thesis proposes a methodology to provide the necessary supporting knowledge for better design decision making. The primary contribution is the establishment of a knowledge engineering framework for design decision support to effectively discover knowledge from the existing data, and efficiently manage and present the knowledge throughout all phases of the aircraft life-cycle. The second contribution is the proposed methodology on the feature generation and exploration, which is used to improve the process of knowledge discovery process significantly. In addition, the proposed work demonstrates several multimedia-based approaches on knowledge presentation.
APA, Harvard, Vancouver, ISO, and other styles
23

Bose, Aishwarya. "Effective web service discovery using a combination of a semantic model and a data mining technique." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/26425/1/Aishwarya_Bose_Thesis.pdf.

Full text
Abstract:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
APA, Harvard, Vancouver, ISO, and other styles
24

Bose, Aishwarya. "Effective web service discovery using a combination of a semantic model and a data mining technique." Queensland University of Technology, 2008. http://eprints.qut.edu.au/26425/.

Full text
Abstract:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
APA, Harvard, Vancouver, ISO, and other styles
25

Yang, Wanzhong. "Granule-based knowledge representation for intra and inter transaction association mining." Thesis, Queensland University of Technology, 2009. https://eprints.qut.edu.au/30398/1/Wanzhong_Yang_Thesis.pdf.

Full text
Abstract:
Abstract With the phenomenal growth of electronic data and information, there are many demands for the development of efficient and effective systems (tools) to perform the issue of data mining tasks on multidimensional databases. Association rules describe associations between items in the same transactions (intra) or in different transactions (inter). Association mining attempts to find interesting or useful association rules in databases: this is the crucial issue for the application of data mining in the real world. Association mining can be used in many application areas, such as the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase, called pattern mining, is the discovery of frequent patterns. The second phase, called rule generation, is the discovery of interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns; these also include much noise. The second phase is also a time consuming activity that can generate many redundant rules. To improve the quality of association mining in databases, this thesis provides an alternative technique, granule-based association mining, for knowledge discovery in databases, where a granule refers to a predicate that describes common features of a group of transactions. The new technique first transfers transaction databases into basic decision tables, then uses multi-tier structures to integrate pattern mining and rule generation in one phase for both intra and inter transaction association rule mining. To evaluate the proposed new technique, this research defines the concept of meaningless rules by considering the co-relations between data-dimensions for intratransaction-association rule mining. It also uses precision to evaluate the effectiveness of intertransaction association rules. The experimental results show that the proposed technique is promising.
APA, Harvard, Vancouver, ISO, and other styles
26

Yang, Wanzhong. "Granule-based knowledge representation for intra and inter transaction association mining." Queensland University of Technology, 2009. http://eprints.qut.edu.au/30398/.

Full text
Abstract:
Abstract With the phenomenal growth of electronic data and information, there are many demands for the development of efficient and effective systems (tools) to perform the issue of data mining tasks on multidimensional databases. Association rules describe associations between items in the same transactions (intra) or in different transactions (inter). Association mining attempts to find interesting or useful association rules in databases: this is the crucial issue for the application of data mining in the real world. Association mining can be used in many application areas, such as the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase, called pattern mining, is the discovery of frequent patterns. The second phase, called rule generation, is the discovery of interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns; these also include much noise. The second phase is also a time consuming activity that can generate many redundant rules. To improve the quality of association mining in databases, this thesis provides an alternative technique, granule-based association mining, for knowledge discovery in databases, where a granule refers to a predicate that describes common features of a group of transactions. The new technique first transfers transaction databases into basic decision tables, then uses multi-tier structures to integrate pattern mining and rule generation in one phase for both intra and inter transaction association rule mining. To evaluate the proposed new technique, this research defines the concept of meaningless rules by considering the co-relations between data-dimensions for intratransaction-association rule mining. It also uses precision to evaluate the effectiveness of intertransaction association rules. The experimental results show that the proposed technique is promising.
APA, Harvard, Vancouver, ISO, and other styles
27

Cicek, A. Ercument. "METABOLIC NETWORK-BASED ANALYSES OF OMICS DATA." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1372866879.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Qu, Xiaoyan Angela. "Discovery and Prioritization of Drug Candidates for Repositioning Using Semantic Web-based Representation of Integrated Diseasome-Pharmacome Knowledge." University of Cincinnati / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1254403900.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Raje, Satyajeet. "ResearchIQ: An End-To-End Semantic Knowledge Platform For Resource Discovery in Biomedical Research." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1354657305.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Seyedarabi, Faezeh. "Developing a model of teachers' web-based information searching : a study of search options and features to support personalised educational resource discovery." Thesis, University College London (University of London), 2013. http://discovery.ucl.ac.uk/10018062/.

Full text
Abstract:
This study has investigated the search options and features teachers use and prefer to have, when personalising their online search for teaching resources. This study focused on making web searching easier for UK teacher practitioners at primary, secondary and post-compulsory levels. In this study, a triangulated mixed method approach was carried out in a two phase iterative case study involving 75 teacher practitioners working in the UK educational setting. In this case study, a sequential evidence gathering method called ‘System Development Life Cycle’ (SDLC) was adapted linking findings obtained from the structured questionnaires, observations and semi-structured interviews in order to design, develop and test two versions of an experimental search tool called “PoSTech!”. This research has contributed to knowledge by offering a model of teachers’ web information needs and search behaviour. In this model twelve search options and features mostly used by teachers when personalising their search for finding online teaching resources via the revised search tool are listed, in order of popularity. A search options is selected by the teacher and features is the characteristic of an option teachers experiences. For example, search options 'Subject', ‘Age Group’, ‘Resource Type’, ‘Free and/ Paid resources’, ‘Search results language’, and search features that ‘Store search options selected by individual teachers and their returned results’. Teachers’ model of web information needs and search behaviour could be used by the Government, teacher trainers and search engine designers to gain an insight into the information needs and search behaviours of teachers when searching for online teaching resources by means of tackling technical barriers faced by teachers, when using the internet. In conclusion, the research work presented in this thesis has provided the initial and important steps towards understanding the web searching information needs and search behaviours of individual teachers, working in the UK educational setting.
APA, Harvard, Vancouver, ISO, and other styles
31

IBARAKI, Toshihide, Endre BOROS, Mutsunori YAGIURA, and Kazuya HARAGUCHI. "A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns." Institute of Electronics, Information and Communication Engineers, 2008. http://hdl.handle.net/2237/15011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Dam, Hai Huong Information Technology &amp Electrical Engineering Australian Defence Force Academy UNSW. "A scalable evolutionary learning classifier system for knowledge discovery in stream data mining." Awarded by:University of New South Wales - Australian Defence Force Academy, 2008. http://handle.unsw.edu.au/1959.4/38865.

Full text
Abstract:
Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks.
APA, Harvard, Vancouver, ISO, and other styles
33

Cun, Yupeng [Verfasser]. "Network-Based Biomarker Discovery : Development of Prognostic Biomarkers for Personalized Medicine by Integrating Data and Prior Knowledge / Yupeng Cun." Bonn : Universitäts- und Landesbibliothek Bonn, 2014. http://d-nb.info/1051027977/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Fornells, Herrera Albert. "Marc integrador de les capacitats de Soft-Computing i de Knowledge Discovery dels Mapes Autoorganitzatius en el Raonament Basat en Casos." Doctoral thesis, Universitat Ramon Llull, 2007. http://hdl.handle.net/10803/9158.

Full text
Abstract:
El Raonament Basat en Casos (CBR) és un paradigma d'aprenentatge basat en establir analogies amb problemes prèviament resolts per resoldre'n de nous. Per tant, l'organització, l'accés i la utilització del coneixement previ són aspectes claus per tenir èxit en aquest procés. No obstant, la majoria dels problemes reals presenten grans volums de dades complexes, incertes i amb coneixement aproximat i, conseqüentment, el rendiment del CBR pot veure's minvat degut a la complexitat de gestionar aquest tipus de coneixement. Això ha fet que en els últims anys hagi sorgit una nova línia de recerca anomenada Soft-Computing and Intelligent Information Retrieval enfocada en mitigar aquests efectes. D'aquí neix el context d'aquesta tesi.
Dins de l'ampli ventall de tècniques Soft-Computing per tractar coneixement complex, els Mapes Autoorganitzatius (SOM) destaquen sobre la resta per la seva capacitat en agrupar les dades en patrons, els quals permeten detectar relacions ocultes entre les dades. Aquesta capacitat ha estat explotada en treballs previs d'altres investigadors, on s'ha organitzat la memòria de casos del CBR amb SOM per tal de millorar la recuperació dels casos.
La finalitat de la present tesi és donar un pas més enllà en la simple combinació del CBR i de SOM, de tal manera que aquí s'introdueixen les capacitats de Soft-Computing i de Knowledge Discovery de SOM en totes les fases del CBR per nodrir-les del nou coneixement descobert. A més a més, les mètriques de complexitat apareixen en aquest context com un instrument precís per modelar el funcionament de SOM segons la tipologia de les dades. L'assoliment d'aquesta integració es pot dividir principalment en quatre fites: (1) la definició d'una metodologia per determinar la millor manera de recuperar els casos tenint en compte la complexitat de les dades i els requeriments de l'usuari; (2) la millora de la fiabilitat de la proposta de solucions gràcies a les relacions entre els clústers i els casos; (3) la potenciació de les capacitats explicatives mitjançant la generació d'explicacions simbòliques; (4) el manteniment incremental i semi-supervisat de la memòria de casos organitzada per SOM.
Tots aquests punts s'integren sota la plataforma SOMCBR, la qual és extensament avaluada sobre datasets provinents de l'UCI Repository i de dominis mèdics i telemàtics.
Addicionalment, la tesi aborda de manera secundària dues línies de recerca fruït dels requeriments dels projectes on ha estat ubicada. D'una banda, s'aborda la definició de funcions de similitud específiques per definir com comparar un cas resolt amb un de nou mitjançant una variant de la Computació Evolutiva anomenada Evolució de Gramàtiques (GE). D'altra banda, s'estudia com definir esquemes de cooperació entre sistemes heterogenis per millorar la fiabilitat de la seva resposta conjunta mitjançant GE. Ambdues línies són integrades en dues plataformes, BRAIN i MGE respectivament, i són també avaluades amb els datasets anteriors.
El Razonamiento Basado en Casos (CBR) es un paradigma de aprendizaje basado en establecer analogías con problemas previamente resueltos para resolver otros nuevos. Por tanto, la organización, el acceso y la utilización del conocimiento previo son aspectos clave para tener éxito. No obstante, la mayoría de los problemas presentan grandes volúmenes de datos complejos, inciertos y con conocimiento aproximado y, por tanto, el rendimiento del CBR puede verse afectado debido a la complejidad de gestionarlos. Esto ha hecho que en los últimos años haya surgido una nueva línea de investigación llamada Soft-Computing and Intelligent Information Retrieval focalizada en mitigar estos efectos. Es aquí donde nace el contexto de esta tesis.
Dentro del amplio abanico de técnicas Soft-Computing para tratar conocimiento complejo, los Mapas Autoorganizativos (SOM) destacan por encima del resto por su capacidad de agrupar los datos en patrones, los cuales permiten detectar relaciones ocultas entre los datos. Esta capacidad ha sido aprovechada en trabajos previos de otros investigadores, donde se ha organizado la memoria de casos del CBR con SOM para mejorar la recuperación de los casos.
La finalidad de la presente tesis es dar un paso más en la simple combinación del CBR y de SOM, de tal manera que aquí se introducen las capacidades de Soft-Computing y de Knowledge Discovery de SOM en todas las fases del CBR para alimentarlas del conocimiento nuevo descubierto. Además, las métricas de complejidad aparecen en este contexto como un instrumento preciso para modelar el funcionamiento de SOM en función de la tipología de los datos. La consecución de esta integración se puede dividir principalmente en cuatro hitos: (1) la definición de una metodología para determinar la mejor manera de recuperar los casos teniendo en cuenta la complejidad de los datos y los requerimientos del usuario; (2) la mejora de la fiabilidad en la propuesta de soluciones gracias a las relaciones entre los clusters y los casos; (3) la potenciación de las capacidades explicativas mediante la generación de explicaciones simbólicas; (4) el mantenimiento incremental y semi-supervisado de la memoria de casos organizada por SOM. Todos estos puntos se integran en la plataforma SOMCBR, la cual es ampliamente evaluada sobre datasets procedentes del UCI Repository y de dominios médicos y telemáticos.
Adicionalmente, la tesis aborda secundariamente dos líneas de investigación fruto de los requeri-mientos de los proyectos donde ha estado ubicada la tesis. Por un lado, se aborda la definición de funciones de similitud específicas para definir como comparar un caso resuelto con otro nuevo mediante una variante de la Computación Evolutiva denominada Evolución de Gramáticas (GE). Por otro lado, se estudia como definir esquemas de cooperación entre sistemas heterogéneos para mejorar la fiabilidad de su respuesta conjunta mediante GE. Ambas líneas son integradas en dos plataformas, BRAIN y MGE, las cuales también son evaluadas sobre los datasets anteriores.
Case-Based Reasoning (CBR) is an approach of machine learning based on solving new problems by identifying analogies with other previous solved problems. Thus, organization, access and management of this knowledge are crucial issues for achieving successful results. Nevertheless, the major part of real problems presents a huge amount of complex data, which also presents uncertain and partial knowledge. Therefore, CBR performance is influenced by the complex management of this knowledge. For this reason, a new research topic has appeared in the last years for tackling this problem: Soft-Computing and Intelligent Information Retrieval. This is the point where this thesis was born.
Inside the wide variety of Soft-Computing techniques for managing complex data, the Self-Organizing Maps (SOM) highlight from the rest due to their capability for grouping data according to certain patterns using the relations hidden in data. This capability has been used in a wide range of works, where the CBR case memory has been organized with SOM for improving the case retrieval.
The goal of this thesis is to take a step up in the simple combination of CBR and SOM. This thesis presents how to introduce the Soft-Computing and Knowledge Discovery capabilities of SOM inside all the steps of CBR to promote them with the discovered knowledge. Furthermore, complexity measures appear in this context as a mechanism to model the performance of SOM according to data topology. The achievement of this goal can be split in the next four points: (1) the definition of a methodology for setting up the best way of retrieving cases taking into account the data complexity and user requirements; (2) the improvement of the classification reliability through the relations between cases and clusters; (3) the promotion of the explaining capabilities by means of the generation of symbolic explanations; (4) the incremental and semi-supervised case-based maintenance. All these points are integrated in the SOMCBR framework, which has been widely tested in datasets from UCI Repository and from medical and telematic domains.
Additionally, this thesis secondly tackles two additional research lines due to the requirements of a project in which it has been developed. First, the definition of similarity functions ad hoc a domain is analyzed using a variant of the Evolutionary Computation called Grammar Evolution (GE). Second, the definition of cooperation schemes between heterogeneous systems is also analyzed for improving the reliability from the point of view of GE. Both lines are developed in two frameworks, BRAIN and MGE respectively, which are also evaluated over the last explained datasets.
APA, Harvard, Vancouver, ISO, and other styles
35

Maus, Aaron. "Formulation of Hybrid Knowledge-Based/Molecular Mechanics Potentials for Protein Structure Refinement and a Novel Graph Theoretical Protein Structure Comparison and Analysis Technique." ScholarWorks@UNO, 2019. https://scholarworks.uno.edu/td/2673.

Full text
Abstract:
Proteins are the fundamental machinery that enables the functions of life. It is critical to understand them not just for basic biology, but also to enable medical advances. The field of protein structure prediction is concerned with developing computational techniques to predict protein structure and function from a protein’s amino acid sequence, encoded for directly in DNA, alone. Despite much progress since the first computational models in the late 1960’s, techniques for the prediction of protein structure still cannot reliably produce structures of high enough accuracy to enable desired applications such as rational drug design. Protein structure refinement is the process of modifying a predicted model of a protein to bring it closer to its native state. In this dissertation a protein structure refinement technique, that of potential energy minimization using hybrid molecular mechanics/knowledge based potential energy functions is examined in detail. The generation of the knowledge-based component is critically analyzed, and in the end, a potential that is a modest improvement over the original is presented. This dissertation also examines the task of protein structure comparison. In evaluating various protein structure prediction techniques, it is crucial to be able to compare produced models against known structures to understand how well the technique performs. A novel technique is proposed that allows an in-depth yet intuitive evaluation of the local similarities between protein structures. Based on a graph analysis of pairwise atomic distance similarities, multiple regions of structural similarity can be identified between structures independently of relative orientation. Multidomain structures can be evaluated and this technique can be combined with global measures of similarity such as the global distance test. This method of comparison is expected to have broad applications in rational drug design, the evolutionary study of protein structures, and in the analysis of the protein structure prediction effort.
APA, Harvard, Vancouver, ISO, and other styles
36

He, Yuanchen. "Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_diss/12.

Full text
Abstract:
Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies.
APA, Harvard, Vancouver, ISO, and other styles
37

Sirin, Göknur. "Supporting multidisciplinary vehicle modeling : towards an ontology-based knowledge sharing in collaborative model based systems engineering environment." Thesis, Châtenay-Malabry, Ecole centrale de Paris, 2015. http://www.theses.fr/2015ECAP0024/document.

Full text
Abstract:
Les systèmes industriels (automobile, aérospatial, etc.) sont de plus en plus complexes à cause des contraintes économiques et écologiques. Cette complexité croissante impose des nouvelles contraintes au niveau du développement. La question de la maitrise de la capacité d’analyse de leurs architectures est alors posée. Pour résoudre cette question, les outils de modélisation et de simulation sont devenus une pratique courante dans les milieux industriels afin de comparer les multiples architectures candidates. Ces outils de simulations sont devenus incontournables pour conforter les décisions. Pourtant, la mise en œuvre des modèles physiques est de plus en plus complexe et nécessite une compréhension spécifique de chaque phénomène simulé ainsi qu’une description approfondie de l’architecture du système, de ses composants et des liaisons entre composants. L’objectif de cette thèse est double. Le premier concerne le développement d’une méthodologie et des outils nécessaires pour construire avec précision les modèles de simulation des architectures de systèmes qu’on désire étudier. Le deuxième s’intéresse à l’introduction d’une approche innovante pour la conception, la production et l’intégration des modèles de simulations en mode « plug and play » afin de garantir la conformité des résultats aux attentes, notamment aux niveaux de la qualité et de la maturité. Pour accomplir ces objectifs, des méthodologies et des processus d’ingénierie des systèmes basés sur les modèles (MBSE) ainsi que les systèmes d’information ont été utilisés. Ce travail de thèse propose pour la première fois un processus détaillé et un outil pour la conception des modèles de simulation. Un référentiel commun nommé « Modèle de carte d'identité (MIC) » a été développé pour standardiser et renforcer les interfaces entre les métiers et les fournisseurs sur les plans organisationnels et techniques. MIC garantit l’évolution et la gestion de la cohérence de l’ensemble des règles et les spécifications des connaissances des domaines métiers dont la sémantique est multiple. MIC renforce également la cohérence du modèle et réduit les anomalies qui peuvent interférer pendant la phase dite IVVQ pour Intégration, Vérification, Validation, Qualification. Finalement, afin de structurer les processus de conception des modèles de simulation, le travail s’est inspiré des cadres de l’Architecture d’Entreprise en reflétant les exigences d’intégration et de standardisation du modèle opératoire de l’entreprise. Pour valider les concepts introduits dans le cadre de cette thèse, des études de cas tirés des domaines automobile et aérospatiale ont été réalisées. L'objectif de cette validation est d'observer l'amélioration significative du processus actuel en termes d'efficacité, de réduction de l'ambiguïté et des malentendus dans la modélisation et la simulation du système à concevoir
Simulation models are widely used by industries as an aid for decision making to explore and optimize a broad range of complex industrial systems’ architectures. The increased complexity of industrial systems (cars, airplanes, etc.), ecological and economic concerns implies a need for exploring and analysing innovative system architectures efficiently and effectively by using simulation models. However, simulations designers currently suffer from limitations which make simulation models difficult to design and develop in a collaborative, multidisciplinary design environment. The multidisciplinary nature of simulation models requires a specific understanding of each phenomenon to simulate and a thorough description of the system architecture, its components and connections between components. To accomplish these objectives, the Model-Based Systems Engineering (MBSE) and Information Systems’ (IS) methodologies were used to support the simulation designer’s analysing capabilities in terms of methods, processes and design tool solutions. The objective of this thesis is twofold. The first concerns the development of a methodology and tools to build accurate simulation models. The second focuses on the introduction of an innovative approach to design, product and integrate the simulation models in a “plug and play" manner by ensuring the expected model fidelity. However, today, one of the major challenges in full-vehicle simulation model creation is to get domain level simulation models from different domain experts while detecting any potential inconsistency problem before the IVVQ (Integration, Verification, Validation, and Qualification) phase. In the current simulation model development process, most of the defects such as interface mismatch and interoperability problems are discovered late, during the IVVQ phase. This may create multiple wastes, including rework and, may-be the most harmful, incorrect simulation models, which are subsequently used as basis for design decisions. In order to address this problem, this work aims to reduce late inconsistency detection by ensuring early stage collaborations between the different suppliers and OEM. Thus, this work integrates first a Detailed Model Design Phase to the current model development process and, second, the roles have been re-organized and delegated between design actors. Finally an alternative architecture design tool is supported by an ontology-based DSL (Domain Specific Language) called Model Identity Card (MIC). The design tools and mentioned activities perspectives (e.g. decisions, views and viewpoints) are structured by inspiration from Enterprise Architecture Frameworks. To demonstrate the applicability of our proposed solution, engine-after treatment, hybrid parallel propulsion and electric transmission models are tested across automotive and aeronautic industries
APA, Harvard, Vancouver, ISO, and other styles
38

Verma, Anju. "Ontology based personalized modeling for chronic disease risk evaluation and knowledge discovery an integrated approach : a thesis submitted to Auckland University of Technology in fulfilment of the requirements for [the] degree of Doctor of Philosophy (PhD), 2009 /." Click here to access this resource online, 2009. http://hdl.handle.net/10292/784.

Full text
Abstract:
Populations are aging and the prevalence of chronic disease, persisting for many years, is increasing. The most common, non-communicable chronic diseases in developed countries are; cardiovascular disease (CVD), type 2 diabetes, obesity, arthritis and specific cancers. Chronic diseases such as cardiovascular disease, type 2 diabetes and obesity have high prevalence and develop over the course of life due to a number of interrelated factors including genetic predisposition, nutrition and lifestyle. With the development and completion of human genome sequencing, we are able to trace genes responsible for proteins and metabolites that are linked with these diseases. A computerized model focused on organizing knowledge related to genes, nutrition and the three chronic diseases, namely, cardiovascular disease, type 2 diabetes and obesity has been developed for the Ontology-Based Personalized Risk Evaluation for Chronic Disease Project. This model is a Protégé-based ontological representation which has been developed for entering and linking concepts and data for these three chronic diseases. This model facilitates to identify interrelationships between concepts. The ontological representation provides the framework into which information on individual patients, disease symptoms, gene maps, diet and life history can be input, and risks, profiles, and recommendations derived. Personal genome and health data could provide a guide for designing and building a medical health administration system for taking relevant annual medical tests, e.g. gene expression level changes for health surveillance. One method, called transductive neuro-fuzzy inference system with weighted data normalization is used to evaluate personalized risk of chronic disease. This personalized approach has been used for two different chronic diseases, predicting the risk of cardiovascular disease and predicting the risk of type 2 diabetes. For predicting the risk of cardiovascular disease, the National Nutrition Health Survey 97 data from New Zealand population has been used. This data contains clinical, anthropometric and nutritional variables. For predicting risk of type 2 diabetes, data from the Italian population with clinical and genetic variables has been used. It has been discovered that genes responsible for causing type 2 diabetes are different in male and female samples. A framework to integrate the personalized model and the chronic disease ontology is also developed with the aim of providing support for further discovery through the integration of the ontological representation in order to build an expert system in genes of interest and relevant dietary components.
APA, Harvard, Vancouver, ISO, and other styles
39

Crowe, Edward R. "A strategy for the synthesis of real-time statistical process control within the framework of a knowledge based controller." Ohio : Ohio University, 1995. http://www.ohiolink.edu/etd/view.cgi?ohiou1174336725.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Lindsey, Daniel Clayton. "A Geospatial Analysis of the Northeastern Plains Village Complex: An Exploration of a GIS-Based Multidisciplinary Method for the Incorporation of Western and Traditional Ecological Knowledge into the Discovery of Diagnostic Prehistoric Settlement Patterns." Thesis, North Dakota State University, 2019. https://hdl.handle.net/10365/31623.

Full text
Abstract:
This thesis research analyzes how Traditional Ecological Knowledge (TEK) can be used to understand extant Northeastern Plains Village (NEPV) settlement strategies in aggregate for the purposes of subjoining a subsequent verification metric to the current archaeological classification system used to describe NEPV associated sites. To accomplish this task, I extracted Traditional Ecological Knowledge from ethnographic sources for comparison to geospatial, geostatistical, and statistical analyses. My results show that the hierarchical clustering exhibited among NEPV sites is congruent with first person narratives of habitation and resource collection activities occurring in the pre-Reservation period (before AD 1880) within the research area. This study emphasizes the importance of the incorporation of Traditional Ecological Knowledge into material typological classification schemes for archaeological sites which are convoluted by a high rates of cultural transmission.
APA, Harvard, Vancouver, ISO, and other styles
41

Giacometto, Torres Francisco Javier. "Adaptive load consumption modelling on the user side: contributions to load forecasting modelling based on supervised mixture of experts and genetic programming." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/457631.

Full text
Abstract:
This research work proposes three main contributions on the load forecasting field: the enhancement of the forecasting accuracy, the enhancement of the model adaptiveness, and the automatization on the execution of the load forecasting strategies implemented. On behalf the accuracy contribution, learning algorithms have been implemented on the basis of machine learning, computational intelligence, evolvable networks, expert systems, and regression approaches. The options for increase the forecasting quality, through the minimization of the forecasting error and the exploitation of hidden insights and miscellaneous properties of the training data, are equally explored in the form of feature based specialized base learners inside of a modelling ensemble structure. Preprocessing and the knowledge discovery algorithms are also implemented in order to boost the accuracy trough cleaning of variables, and to enhance the autonomy of the modelling algorithm via non-supervised intelligent algorithms respectively. The Adaptability feature has been enhanced by the implementation of three components inside of an ensemble learning strategy. The first one corresponds to resampling techniques, it ensures the replication of the global probability distribution on multiple independent training sub-sets and consequently the training of base learners on representatives spaces of occurrences. The second one corresponds to multi-resolution and cyclical analysis techniques; through the decomposition of endogenous variables on their time-frequency components, major insights are acquired and applied on the definition of the ensemble structure layout. The third one corresponds to Self-organized modelling algorithms, which provides of fully customized base learner's. The Autonomy feature is reached by the combination of automatic procedures in order to minimize the interaction of an expert user on the forecasting procedure. Experimental results obtained, from the application of the load forecasting strategies proposed, have demonstrated the suitability of the techniques and methodologies implemented, especially on the case of the novel ensemble learning strategy.
Este trabajo de investigación propone tres aportaciones principales en el campo de la previsión de consumos: la mejora en la exactitud de la predicción, la mejora en la adaptabilidad del modelo ante diferentes escenarios de consumo y la automatización en la ejecución de los algoritmos de modelado y predicción. La mejora de precisión que ha sido introducida en la estrategia de modelado propuesta ha sido obtenida tras la implementación de algoritmos de aprendizaje supervisados pertenecientes a las siguientes familias de técnicas: aprendizaje de máquinas, inteligencia computacional, redes evolutivas, sistemas expertos y técnicas de regresión. Otras las medidas implementadas para aumentar la calidad de la predicción han sido: la minimización del error de pronóstico a través de la extracción de información basada en análisis multi-variable, la combinación de modelos expertos especializados en atributos específicos del perfil de consumo, el uso de técnicas de pre procesamiento para aumentar la precisión a través de la limpieza de variables, y por último implementación de la algoritmos de clasificación no supervisados para obtener los atributos y las clases características del consumo. La mejora en la adaptación del algoritmo de modelado se ha conseguido mediante la implementación de tres componentes al interior de la estrategia de combinación de modelos expertos. El primer componente corresponde a la implementación de técnicas de muestreo sobre cada conjunto de datos agrupados por clase; esto asegura la replicación de la distribución de probabilidad global en múltiples y estadísticamente independientes subconjuntos de entrenamiento. Estos sub conjuntos son usados para entrenar los modelos expertos que consecuentemente pasaran a formar los modelos base de la estructura jerárquica que combina los modelos expertos. El segundo componente corresponde a técnicas de análisis multi-resolución. A través de la descomposición de variables endógenas en sus componentes tiempo-frecuencia, se abstraen e implementan conocimientos importantes sobre la forma de la estructura jerárquica que adoptaran los modelos expertos. El tercero componente corresponde a los algoritmos de modelado que generan una topología interior auto organizada, que proporciona de modelo experto base completamente personalizado al perfil de consumo analizado. La mejora en la automatización se alcanza mediante la combinación de procedimientos automáticos para minimizar la interacción de un usuario experto en el procedimiento de predicción. Los resultados experimentales obtenidos, a partir de la aplicación de las estrategias de predicción de consumos propuestas, han demostrado la idoneidad de las técnicas y metodologías implementadas; sobre todo en el caso de la novedosa estrategia para la combinación de modelos expertos.
APA, Harvard, Vancouver, ISO, and other styles
42

Ma, Sihui. "Discovery and dissemination of new knowledge in food science: Analytical methods for quantification of polyphenols and amino acids in fruits and the use of mobile phone-based instructional technology in food science education." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/100997.

Full text
Abstract:
The discovery and dissemination of new knowledge are essential in food science. To advance our understanding of fruit chemistry, analytical methods were compared and applied. Polyphenols are secondary metabolites in fruits of particular importance in food science, as they contribute to the sensory attributes and health benefits of the products. Evaluation of common analytical methods for the quantification of polyphenols, including the Folin-Ciocalteu (F-C), Lowenthal permanganate (L-P), 4-dimethylaminocinnamaldehyde (DMAC) and the bovine serum albumin (BSA) precipitation methods, was conducted using analytical method validation procedures. The F-C method was not specific to polyphenols, and the L-P method had the widest working range but lacked accuracy. The DMAC method was the most specific to flavanols, and the BSA method was not suitable for quantification of smaller flavanols. Quantitative performance of these four methods was evaluated using a broad range of fruit-derived samples. Variation in quantitative results obtained using these four methods was explained by differences in polyphenol and matrix composition of these samples and differences in operating principles of the methods. The reactivity of individual polyphenol compounds (catechin, epicatechin, PC B2, PC pentamer, chlorogenic acid, phloretin, and quercetin) to the polyphenol and flavanol quantification results using Prussian blue (P-B), F-C, DMAC and BSA precipitation methods were also assessed and determined to differ by up to thirteen-fold, depending on the assay. Furthermore, the contribution and interactions of polyphenol compounds (catechin, PC B2, and chlorogenic acid) and potentially interfering compounds likely to be found in fruit and fruit products (ascorbic acid, glucose, and SO2) to the quantitative results of these methods were evaluated using a full factorial design. Significant interactions among polyphenol compounds, and among the interfering compounds were found. The standardized coefficient (β) for all factors and interactions of polyphenol compounds varied from 0.347 to 129, and from near 0 to -46.8 for all factors and interactions of interfering compounds. Our findings indicate that the choice of standards, polyphenol and matrix composition of the sample may cause disparity among the quantitative results of these methods. Amino acids in apple (Malus × domestica Borkh.) juice not only influence the quality of fermented cider through fermentation kinetics but also impact the flavor of the cider through yeast metabolism. Due to recent advances in analytical instrumentation, amino acids profiles in apple juice were determined much faster and more accurately than by previously applied methods. Twenty amino acids were quantified by UPLC-PDA in juices from 13 apple cultivars grown in Virginia. The relative amino acid profile was significantly different among the apple juices evaluated. The total amino acid concentration ranged from 18 mg/L in Blacktwig juice to 57 mg/L in Enterprise juice. L-Asparagine, L-aspartic acid and L-glutamine are the principal amino acids observed in most apple juices. These results will inform future research on yeast metabolism and nitrogen management during cider fermentation. To better disseminate knowledge gained through research to the next generation of food scientists, the effectiveness of new instructional technology—a cellphone-based personal response system—in food science education was evaluated. Students' academic performance was improved by the incorporation of this technology into lectures, and its use was well perceived by the students (easy to use and positively impacted their learning). This finding contributes to the scholarship of teaching and learning in food science by providing useful insight into the potential for application of such tools with improved student engagement and learning outcomes. Advances in food chemistry research will enable the development of value-added food products, and the pedagogical advancement in food science education will better convey new and existing knowledge to students, who will apply this knowledge to promote a safe and nutritious food supply that enhances human health and increases the value of specialty crops.
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
43

Griffiths, Kerryn Eva. "Discovering, applying and integrating self-knowledge : a grounded theory study of learning in life coaching." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/37245/1/Kerryn_Griffiths_Thesis.pdf.

Full text
Abstract:
Professional coaching is a rapidly expanding field with interdisciplinary roots and broad application. However, despite abundant prescriptive literature, research into the process of coaching, and especially life coaching, is minimal. Similarly, although learning is inherently recognised in the process of coaching, and coaching is increasingly being recognised as a means of enhancing teaching and learning, the process of learning in coaching is little understood, and learning theory makes up only a small part of the evidence-based coaching literature. In this grounded theory study of life coaches and their clients, the process of learning in life coaching across a range of coaching models is examined and explained. The findings demonstrate how learning in life coaching emerged as a process of discovering, applying and integrating self-knowledge, which culminated in the development of self. This process occurred through eight key coaching processes shared between coaches and clients and combined a multitude of learning theory.
APA, Harvard, Vancouver, ISO, and other styles
44

Scarinci, Rui Gureghian. "SES : sistema de extração semântica de informações." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1997. http://hdl.handle.net/10183/18398.

Full text
Abstract:
Entre as áreas que mais se desenvolvem na informática nos últimos anos estão aquelas relacionadas ao crescimento da rede Internet, que interliga milhões de usuários de todo o mundo. Esta rede disponibiliza aos usuários uma a enorme variedade e quantidade de informações, principalmente dados armazenados de forma não estruturada ou semi estruturada. Contudo, tal volume e heterogeneidade acaba dificultando a manipulação dos dados recuperados a partir da Internet. Este problema motivou o desenvolvimento deste trabalho. Mesmo com o auxílio de várias ferramentas de pesquisa na Internet, buscando realizar pesquisas sobre assuntos específicos, o usuário ainda tem que manipular em seu computador pessoal uma grande quantidade de informação, pois estas ferramentas não realizam um processo de seleção detalhado. Ou seja, são recuperados muitos dados não interessantes ao usuário. Existe, também, uma grande diversidade de assuntos e padrões de transferência e armazenamento da informação criando os mais heterogêneos ambientes de pesquisa e consulta de dados. Esta heterogeneidade faz com que o usuário da rede deva conhecer todo um conjunto de padrões e ferramentas a fim de obter a informação desejada. No entanto, a maior dificuldade de manipulação esta ligada aos formatos de armazenamento não estruturados ou pouco estruturados, como, por exemplo: arquivos textos, Mails (correspondência eletrônica) e artigos de News (jornais eletrônicos). Nestes formatos, o entendimento do documento exige a leitura do mesmo pelo usuário, o que muitas vezes acarreta em um gasto de tempo desnecessário, pois o documento, por exemplo, pode não ser de interesse deste ou, então, ser de interesse, mas sua leitura completa só seria útil posteriormente. Várias informações, como chamadas de trabalhos para congressos, preços de produtos e estatísticas econômicas, entre outras, apresentam validade temporal. Outras informações são atualizadas periodicamente. Muitas dessas características temporais são explicitas, outras estão implícitas no meio de outros tipos de dados. Isto torna muito difícil a recuperação de tal tipo de informação, gerando, várias vezes, a utilização de informações desatualizadas, ou a perda de oportunidades. Desta forma, o grande volume de dados em arquivos pessoais obtidos a partir da Internet criou uma complexa tarefa de gerenciamento dos mesmos em conseqüência da natureza não estruturada dos documentos recuperados e da complexidade da análise do tempo de validade inerente a estes dados. Com o objetivo de satisfazer as necessidades de seleção e conseqüente manipulação das informações existentes a nível local (computador pessoal), neste trabalho, é descrito um sistema para extração e sumarização destes dados, utilizando conceitos de IE (Information Extraction) e Sistemas Baseados em Conhecimento. Os dados processados são parcialmente estruturados ou não estruturados, sendo manipulados por um extrator configurado a partir de bases de conhecimento geradas pelo usuário do sistema. O objetivo final desta dissertação é a implementação do Sistema de Extração Semântica de Informações, o qual permite a classificação dos dados extraídos em classes significativas para o usuário e a determinação da validade temporal destes dados a partir da geração de uma base de dados estruturada.
One of the most challenging area in Computer Science is related to Internet technology. This network offers to the users a large variety and amount of information, mainly, data storage in unstructured or semi-structured formats. However, the vast data volume and heterogeneity transforms the retrieved data manipulation a very arduous work. This problem was the prime motivation of this work. As with many tools for data retrieval and specific searching, the user has to manipulate in his personal computer an increasing amount of information, because these tools do not realize a precise data selection process. Many retrieval data are not interesting for the user. There are, also, a big diversity of subjects and standards in information transmission and storage, creating the most heterogeneous environments in data searching and retrieval. Due to this heterogeneity, the user has to know many data standards and searching tools to obtain the requested information. However, the fundamental problem for data manipulation is the partially or fully unstructured data formats, as text, mail and news data structures. For files in these formats, the user has to read each of the files to filter the relevant information, originating a loss of time, because the document could be not interesting for the user, or if it is interesting, its complete reading may be unnecessary at the moment. Some information as call-for-papers, product prices, economic statistics and others, has associated a temporal validity. Other information are updated periodically. Some of these temporal characteristics are explicit, others are implicitly embedded in other data types. As it is very difficult to retrieve the temporal data automatically, which generate, many times, the use of invalid information, as a result, some opportunities are lost. On this paper a system for extraction and summarizing of data is described. The main objective is to satisfy the user's selection needs and consequently information manipulation stored in a personal computer. To achieve this goal we are employed the concepts of Information Extraction (IE) and Knowledge Based Systems. The input data manipulation is done by an extraction procedure configured by a user who defined knowledge base. The objective of this paper is to develop a System of Semantic Extraction of Information which classifies the data extracted in meaningful classes for the user and to deduce the temporal validity of this data. This goal was achieved by the generation of a structured temporal data base.
APA, Harvard, Vancouver, ISO, and other styles
45

Lazarski, Adam. "The importance of contextual factors on the accuracy of estimates in project management : an emergence of a framework for more realistic estimation process." Thesis, University of Bradford, 2014. http://hdl.handle.net/10454/13661.

Full text
Abstract:
Successful projects are characterized by the quality of their planning. Good planning that better takes into account contextual factors allows more accurate estimates to be achieved. As an outcome of this research, a new framework composed of best practices has been discovered. This comprises an open platform that project experts and practitioners can work with efficiently, and that researchers can develop further as required. The research investigation commenced in the autumn of 2008 with a pilot study and then proceeded through an inductive research process, involving a series of eleven interviews. These consisted of interviews with four well-recognized experts in the field, four interviews with different practitioners and three group interviews. In addition, a long-running observation of forty-five days was conceptualized, together with other data sources, before culminating in the proposal of a new framework for improving the accuracy of estimates. Furthermore, an emerging framework – and a description of its know-how in terms of application – have been systematically reviewed through the course of four hundred twenty-five days of meetings, dedicated for the most part to improving the use of a wide range of specific project management tools and techniques and to an improvement in understanding of planning and the estimation process associated with it. This approach constituted an ongoing verification of the research’s findings against project management practice and also served as an invaluable resource for the researcher’s professional and practice-oriented development. The results obtained offered fresh insights into the importance of knowledge management in the estimation process, including the “value of not knowing”, the oft-overlooked phenomenon of underestimation and its potential to co-exist with overestimation, and the use of negative buffer management in the critical chain concept to secure project deadlines. The project also highlighted areas of improvement for future research practice that wishes to make use of an inductive approach in order to achieve a socially agreed framework, rather than a theory alone. In addition, improvements were suggested to the various qualitative tools employed in the customized data analysis process.
APA, Harvard, Vancouver, ISO, and other styles
46

Marsolo, Keith Allen. "A workflow for the modeling and analysis of biomedical data." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1180309265.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Hlosta, Martin. "Modul pro shlukovou analýzu systému pro dolování z dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237158.

Full text
Abstract:
This thesis deals with the design and implementation of a cluster analysis module for currently developing datamining system DataMiner on FIT BUT. So far, the system lacked cluster analysis module. The main objective of the thesis was therefore to extend the system of such a module. Together with me, Pavel Riedl worked on the module. We have created a common part for all the algorithms so that the system can be easily extended to other clustering algorithms. In the second part, I extended the clustering module by adding three density based clustering aglorithms - DBSCAN, OPTICS and DENCLUE. Algorithms have been implemented and appropriate sample data was chosen to verify theirs functionality.
APA, Harvard, Vancouver, ISO, and other styles
48

Dong, Hai. "A customized semantic service retrieval methodology for the digital ecosystems environment." Thesis, Curtin University, 2010. http://hdl.handle.net/20.500.11937/2345.

Full text
Abstract:
With the emergence of the Web and its pervasive intrusion on individuals, organizations, businesses etc., people now realize that they are living in a digital environment analogous to the ecological ecosystem. Consequently, no individual or organization can ignore the huge impact of the Web on social well-being, growth and prosperity, or the changes that it has brought about to the world economy, transforming it from a self-contained, isolated, and static environment to an open, connected, dynamic environment. Recently, the European Union initiated a research vision in relation to this ubiquitous digital environment, known as Digital (Business) Ecosystems. In the Digital Ecosystems environment, there exist ubiquitous and heterogeneous species, and ubiquitous, heterogeneous, context-dependent and dynamic services provided or requested by species. Nevertheless, existing commercial search engines lack sufficient semantic supports, which cannot be employed to disambiguate user queries and cannot provide trustworthy and reliable service retrieval. Furthermore, current semantic service retrieval research focuses on service retrieval in the Web service field, which cannot provide requested service retrieval functions that take into account the features of Digital Ecosystem services. Hence, in this thesis, we propose a customized semantic service retrieval methodology, enabling trustworthy and reliable service retrieval in the Digital Ecosystems environment, by considering the heterogeneous, context-dependent and dynamic nature of services and the heterogeneous and dynamic nature of service providers and service requesters in Digital Ecosystems.The customized semantic service retrieval methodology comprises: 1) a service information discovery, annotation and classification methodology; 2) a service retrieval methodology; 3) a service concept recommendation methodology; 4) a quality of service (QoS) evaluation and service ranking methodology; and 5) a service domain knowledge updating, and service-provider-based Service Description Entity (SDE) metadata publishing, maintenance and classification methodology.The service information discovery, annotation and classification methodology is designed for discovering ubiquitous service information from the Web, annotating the discovered service information with ontology mark-up languages, and classifying the annotated service information by means of specific service domain knowledge, taking into account the heterogeneous and context-dependent nature of Digital Ecosystem services and the heterogeneous nature of service providers. The methodology is realized by the prototype of a Semantic Crawler, the aim of which is to discover service advertisements and service provider profiles from webpages, and annotating the information with service domain ontologies.The service retrieval methodology enables service requesters to precisely retrieve the annotated service information, taking into account the heterogeneous nature of Digital Ecosystem service requesters. The methodology is presented by the prototype of a Service Search Engine. Since service requesters can be divided according to the group which has relevant knowledge with regard to their service requests, and the group which does not have relevant knowledge with regard to their service requests, we respectively provide two different service retrieval modules. The module for the first group enables service requesters to directly retrieve service information by querying its attributes. The module for the second group enables service requesters to interact with the search engine to denote their queries by means of service domain knowledge, and then retrieve service information based on the denoted queries.The service concept recommendation methodology concerns the issue of incomplete or incorrect queries. The methodology enables the search engine to recommend relevant concepts to service requesters, once they find that the service concepts eventually selected cannot be used to denote their service requests. We premise that there is some extent of overlap between the selected concepts and the concepts denoting service requests, as a result of the impact of service requesters’ understandings of service requests on the selected concepts by a series of human-computer interactions. Therefore, a semantic similarity model is designed that seeks semantically similar concepts based on selected concepts.The QoS evaluation and service ranking methodology is proposed to allow service requesters to evaluate the trustworthiness of a service advertisement and rank retrieved service advertisements based on their QoS values, taking into account the contextdependent nature of services in Digital Ecosystems. The core of this methodology is an extended CCCI (Correlation of Interaction, Correlation of Criterion, Clarity of Criterion, and Importance of Criterion) metrics, which allows a service requester to evaluate the performance of a service provider in a service transaction based on QoS evaluation criteria in a specific service domain. The evaluation result is then incorporated with the previous results to produce the eventual QoS value of the service advertisement in a service domain. Service requesters can rank service advertisements by considering their QoS values under each criterion in a service domain.The methodology for service domain knowledge updating, service-provider-based SDE metadata publishing, maintenance, and classification is initiated to allow: 1) knowledge users to update service domain ontologies employed in the service retrieval methodology, taking into account the dynamic nature of services in Digital Ecosystems; and 2) service providers to update their service profiles and manually annotate their published service advertisements by means of service domain knowledge, taking into account the dynamic nature of service providers in Digital Ecosystems. The methodology for service domain knowledge updating is realized by a voting system for any proposals for changes in service domain knowledge, and by assigning different weights to the votes of domain experts and normal users.In order to validate the customized semantic service retrieval methodology, we build a prototype – a Customized Semantic Service Search Engine. Based on the prototype, we test the mathematical algorithms involved in the methodology by a simulation approach and validate the proposed functions of the methodology by a functional testing approach.
APA, Harvard, Vancouver, ISO, and other styles
49

Blondet, Gaëtan. "Système à base de connaissances pour le processus de plan d'expériences numériques." Thesis, Compiègne, 2017. http://www.theses.fr/2017COMP2363/document.

Full text
Abstract:
Le besoin de compétitivité des entreprises, dans un contexte économique mondialisé, repose sur l'amélioration de la qualité des produits et la réduction des coûts et du temps de mise sur le marché. Pour atteindre ces objectifs, la simulation numérique est couramment utilisée pour la conception de produits complexes et mobilise des expertises diverses. Les Plans d'Expériences Numériques (PEN) sont de plus en plus utilisés pour simuler les variabilités des propriétés et de l’environnement du produit. Un processus de PEN apporte des méthodes de planification et d'analyse d'un ensemble de simulations, pour mieux maîtriser les performances du produit. La problématique traitée repose sur deux points. D'une part, la définition d'un processus de PEN repose sur de nombreux choix et l'utilisation de méthodes complexes, nécessitant une expertise avancée. Cette définition est d'autant plus complexe que le modèle de simulation est complexe et coûteux à exécuter. D'autre part, l'utilisation de PEN conduit à une production de grands volumes de données en multipliant les simulations. Ces travaux portent sur l'obtention rapide de la configuration optimale du processus de PEN pour raccourcir la préparation et l’exécution d’un PEN. Ces travaux se sont orientés vers la réutilisation des connaissances en entreprise pour un système à base de connaissances, composé d'une ontologie spécifique, pour capitaliser et partager les connaissances, et d'un moteur d'inférences, basé sur les réseaux bayésiens, pour proposer aux concepteurs des configurations efficaces et innovantes. Cette proposition est illustrée par une application sur un produit industriel issue du secteur automobile
In order to improve industrial competitiveness, product design relies more and more on numerical tools, such as numerical simulation, to develop better and cheaper products faster. Numerical Design of Experiments (NDOE) are more and more used to include variabilities during simulation processes, to design more robust, reliable and optimized product earlier in the product development process. Nevertheless, a NDOE process may be too expensive to be applied to a complex product, because of the high computational cost of the model and the high number of required experiments. Several methods exist to decrease this computational cost, but they required expert knowledge to be efficiently applied. In addition to that, NDoE process produces a large amount of data which must be managed. The aim of this research is to propose a solution to define, as fast as possible, an efficient NDoE process, which produce as much useful information as possible with a minimal number of simulations, for complex products. The objective is to shorten both process definition and execution steps. A knowledge-based system is proposed, based on a specific ontology and a bayesian network, to capitalise, share and reuse knowledge and data to predict the best NDoE process definition regarding to a new product. This system is validated on a product from automotive industry
APA, Harvard, Vancouver, ISO, and other styles
50

Rougier, Simon. "Apport des images satellites à très haute résolution spatiale couplées à des données géographiques multi-sources pour l’analyse des espaces urbains." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAH019/document.

Full text
Abstract:
Les villes sont confrontées à de nombreuses problématiques environnementales. Leurs gestionnaires ont besoin d'outils et d'une bonne connaissance de leur territoire. Un objectif est de mieux comprendre comment s'articulent les trames grise et verte pour les analyser et les représenter. Il s'agit aussi de proposer une méthodologie pour cartographier la structure urbaine à l'échelle des tissus en tenant compte de ces trames. Les bases de données existantes ne cartographient pas la végétation de manière exhaustive. Ainsi la première étape est d'extraire la végétation arborée et herbacée à partir d'images satellites Pléiades par une analyse orientée-objet et une classification par apprentissage actif. Sur la base de ces classifications et de données multi-sources, la cartographie des tissus se base sur une démarche d'extraction de connaissances à partir d'indicateurs issus de l'urbanisme et de l'écologie du paysage. Cette méthodologie est construite sur Strasbourg puis appliquée à Rennes
Climate change presents cities with significant environmental challenges. Urban planners need decision-making tools and a better knowledge of their territory. One objective is to better understand the link between the grey and the green infrastructures in order to analyse and represent them. The second objective is to propose a methodology to map the urban structure at urban fabric scale taking into account the grey and green infrastructures. In current databases, vegetation is not mapped in an exhaustive way. Therefore the first step is to extract tree and grass vegetation using Pléiades satellite images using an object-based image analysis and an active learning classification. Based on those classifications and multi-sources data, an approach based on knowledge discovery in databases is proposed. It is focused on set of indicators mostly coming from urbanism and landscape ecology. The methodology is built on Strasbourg and applied on Rennes to validate and check its reproducibility
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography