To see the other types of publications on this topic, follow the link: Data Domains.

Dissertations / Theses on the topic 'Data Domains'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Data Domains.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Crockett, Keeley Alexandria. "Fuzzy rule induction from data domains." Thesis, Manchester Metropolitan University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.243720.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

McLean, David. "Improving generalisation in continuous data domains." Thesis, University of Manchester, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.283816.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hsu, Bo-June (Bo-June Paul). "Language Modeling for limited-data domains." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/52796.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student submitted PDF version of thesis.
Includes bibliographical references (p. 99-109).
With the increasing focus of speech recognition and natural language processing applications on domains with limited amount of in-domain training data, enhanced system performance often relies on approaches involving model adaptation and combination. In such domains, language models are often constructed by interpolating component models trained from partially matched corpora. Instead of simple linear interpolation, we introduce a generalized linear interpolation technique that computes context-dependent mixture weights from features that correlate with the component confidence and relevance for each n-gram context. Since the n-grams from partially matched corpora may not be of equal relevance to the target domain, we propose an n-gram weighting scheme to adjust the component n-gram probabilities based on features derived from readily available corpus segmentation and metadata to de-emphasize out-of-domain n-grams. In scenarios without any matched data for a development set, we examine unsupervised and active learning techniques for tuning the interpolation and weighting parameters. Results on a lecture transcription task using the proposed generalized linear interpolation and n-gram weighting techniques yield up to a 1.4% absolute word error rate reduction over a linearly interpolated baseline language model. As more sophisticated models are only as useful as they are practical, we developed the MIT Language Modeling (MITLM) toolkit, designed for efficient iterative parameter optimization, and released it to the research community.
(cont.) With a compact vector-based n-gram data structure and optimized algorithm implementations, the toolkit not only improves the running time of common tasks by up to 40x, but also enables the efficient parameter tuning for language modeling techniques that were previously deemed impractical.
by Bo-June (Paul) Hsu.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
4

MAHOTO, NAEEM AHMED. "Data mining techniques for complex application domains." Doctoral thesis, Politecnico di Torino, 2013. http://hdl.handle.net/11583/2506368.

Full text
Abstract:
The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains. The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis. The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments. The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively.
APA, Harvard, Vancouver, ISO, and other styles
5

RICUPERO, GIUSEPPE. "Exploring Data Hierarchies to Discover Knowledge in Different Domains." Doctoral thesis, Politecnico di Torino, 2019. http://hdl.handle.net/11583/2744938.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Carapelle, Claudia. "On the Satisfiability of Temporal Logics with Concrete Domains." Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-190987.

Full text
Abstract:
Temporal logics are a very popular family of logical languages, used to specify properties of abstracted systems. In the last few years, many extensions of temporal logics have been proposed, in order to address the need to express more than just abstract properties. In our work we study temporal logics extended by local constraints, which allow to express quantitative properties on data values from an arbitrary relational structure called the concrete domain. An example of concrete domain can be (Z, <, =), where the integers are considered as a relational structure over the binary order relation and the equality relation. Formulas of temporal logics with constraints are evaluated on data-words or data-trees, in which each node or position is labeled by a vector of data from the concrete domain. We call the constraints local because they can only compare values at a fixed distance inside such models. Several positive results regarding the satisfiability of LTL (linear temporal logic) with constraints over the integers have been established in the past years, while the corresponding results for branching time logics were only partial. In this work we prove that satisfiability of CTL* (computation tree logic) with constraints over the integers is decidable and also lift this result to ECTL*, a proper extension of CTL*. We also consider other classes of concrete domains, particularly ones that are \"tree-like\". We consider semi-linear orders, ordinal trees and trees of a fixed height, and prove decidability in this framework as well. At the same time we prove that our method cannot be applied in the case of the infinite binary tree or the infinitely branching infinite tree. We also look into extending the expressiveness of our logic adding non-local constraints, and find that this leads to undecidability of the satisfiability problem, even on very simple domains like (Z, <, =). We then find a way to restrict the power of the non-local constraints to regain decidability.
APA, Harvard, Vancouver, ISO, and other styles
7

McGregor, Simon. "Artificial neural networks for novel data domains : principles and examples." Thesis, University of Sussex, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.497000.

Full text
Abstract:
I assume that the reader of this thesis is reasonably familiar with artificial neural network (ANN) methods in computer science, including the multi-layer perceptron (ML?) and the backpropagation training method. I have not needed to use any difficult or esoteric mathematics; the major mathematical concept encountered in the thesis is the multiset (which is easy to grasp for anyone familiar with set theory). Certain chapters also make use of the notions of partial derivatives. inner products in arbitrary vector spaces, and metrics.
APA, Harvard, Vancouver, ISO, and other styles
8

Ng, Siu Hung. "An extension of the relational data model to incorporate ordered domains." Thesis, University College London (University of London), 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.268033.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Baxter, Rolf Hugh. "Recognising high-level agent behaviour through observations in data scarce domains." Thesis, Heriot-Watt University, 2012. http://hdl.handle.net/10399/2597.

Full text
Abstract:
This thesis presents a novel method for performing multi-agent behaviour recognition without requiring large training corpora. The reduced need for data means that robust probabilistic recognition can be performed within domains where annotated datasets are traditionally unavailable (e.g. surveillance, defence). Human behaviours are composed from sequences of underlying activities that can be used as salient features. We do not assume that the exact temporal ordering of such features is necessary, so can represent behaviours using an unordered “bag-of-features”. A weak temporal ordering is imposed during inference to match behaviours to observations and replaces the learnt model parameters used by competing methods. Our three-tier architecture comprises low-level video tracking, event analysis and high-level inference. High-level inference is performed using a new, cascading extension of the Rao-Blackwellised Particle Filter. Behaviours are recognised at multiple levels of abstraction and can contain a mixture of solo and multiagent behaviour. We validate our framework using the PETS 2006 video surveillance dataset and our own video sequences, in addition to a large corpus of simulated data. We achieve a mean recognition precision of 96.4% on the simulated data and 89.3% on the combined video data. Our “bag-of-features” framework is able to detect when behaviours terminate and accurately explains agent behaviour despite significant quantities of low-level classification errors in the input, and can even detect agents who change their behaviour.
APA, Harvard, Vancouver, ISO, and other styles
10

Ferguson, Alexander B. "Higher order strictness analysis by abstract interpretation over finite domains." Thesis, University of Glasgow, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.308143.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Ntantamis, Christos. "Identifying hidden boundaries within economic data in the time and space domains." Thesis, McGill University, 2009. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=115616.

Full text
Abstract:
This thesis presents methodological contributions to the modeling of regimes in the time or space domain of economic data by introducing a number of algorithms from engineering applications and substantially modifying them so that can be used in economic applications. The objective is twofold: to estimate the parameters of such models, and to identify the corresponding boundaries between regimes. The models used belong to the class of Finite Mixture Models and their natural extensions for the case of dependent data, Hidden Markov Models (see McLachlan and Peel 2000). Mixture models are extremely useful in the modeling of heterogeneity in a cluster analysis context; the components of the mixtures, or the states, will correspond to the different latent groups, e.g. homogeneous regions such as the housing submarkets or regimes in the case of stock market returns.
The thesis discusses issues of alternative estimation algorithms that provide larger model flexibility in capturing the underlying data dynamics, and of procedures that allow the selection of the number of the regimes in the data.
The first part introduces a model of spatial association for housing markets, which is approached in the context of spatial heterogeneity. A Hedonic Price Index model is considered, i.e. a model where the price of the dwelling is determined by its structural and neighborhood characteristics. Remaining spatial heterogeneity is modeled as a Finite Mixture Model for the residuals of the Hedonic Index. The Finite Mixture Model is estimated using the Figueiredo and Jain (2002) approach. The overall ability of the model to identify spatial heterogeneity is evaluated through a set of simulations. The model was applied to Los Angeles County housing prices data for the year 2002. The statistically identified number of submarkets, after taking into account the dwellings' structural characteristics, are found to be considerably fewer than the ones imposed either by geographical or administrative boundaries, thus making it more suitable for mass assessment applications.
The second part of the thesis introduces a Duration Hidden Markov Model to represent regime switches in the stock market; the duration of each state of the Markov Chain is explicitly modeled as a random variable that depends on a set of exogenous variables. Therefore, the model not only allows the endogenous determination of the different regimes but also estimates the effect of the explanatory variables on the regimes' durations. The model is estimated on NYSE returns using the short-term interest rate and the interest rate spread as exogenous variables. The estimation results coincide with existing findings in the literature, in terms of regimes' characteristics, and are compatible with basic economic intuition, in terms of the effect of the exogenous variables on regimes' durations.
The final part of the thesis considers a Hidden Markov Model (HMM) approach in order to perform the task of detecting structural breaks, which are defined as the data points where the underlying Markov Chain switches from one state to another: A new methodology is proposed in order to estimate all aspects of the model: number of regimes, parameters of the model corresponding to each regime, and the locations of regime switches. One of the main advantages of the proposed methodology is that it allows for different model specifications across regimes. The performance of the overall procedure, denoted IMI by the initials of the component algorithms is validated by two sets of simulations: one in which only the parameters are permitted to differ across regimes, and one that also permits differences in the functional forms. The IMI method performs very well across all specifications in both sets of simulations.
APA, Harvard, Vancouver, ISO, and other styles
12

Filannino, Michele. "Data-driven temporal information extraction with applications in general and clinical domains." Thesis, University of Manchester, 2016. https://www.research.manchester.ac.uk/portal/en/theses/datadriven-temporal-information-extraction-with-applications-in-general-and-clinical-domains(34d7e698-f8a8-4fbf-b742-d522c4fe4a12).html.

Full text
Abstract:
The automatic extraction of temporal information from written texts is pivotal for many Natural Language Processing applications such as question answering, text summarisation and information retrieval. However, Temporal Information Extraction (TIE) is a challenging task because of the amount of types of expressions (durations, frequencies, times, dates) and their high morphological variability and ambiguity. As far as the approaches are concerned, the most common among the existing ones is rule-based, while data-driven ones are under-explored. This thesis introduces a novel domain-independent data-driven TIE strategy. The identification strategy is based on machine learning sequence labelling classifiers on features selected through an extensive exploration. Results are further optimised using an a posteriori label-adjustment pipeline. The normalisation strategy is rule-based and builds on a pre-existing system. The methodology has been applied to both specific (clinical) and generic domain, and has been officially benchmarked at the i2b2/2012 and TempEval-3 challenges, ranking respectively 3rd and 1st. The results prove the TIE task to be more challenging in the clinical domain (overall accuracy 63%) rather than in the general domain (overall accuracy 69%).Finally, this thesis also presents two applications of TIE. One of them introduces the concept of temporal footprint of a Wikipedia article, and uses it to mine the life span of persons. In the other case, TIE techniques are used to improve pre-existing information retrieval systems by filtering out temporally irrelevant results.
APA, Harvard, Vancouver, ISO, and other styles
13

Bujuru, Swathi. "Event recognition in epizootic domains." Kansas State University, 2010. http://hdl.handle.net/2097/7070.

Full text
Abstract:
Master of Science
Department of Computing and Information Sciences
William H. Hsu
In addition to named entities such as persons, locations, organizations, and quantities which convey factual information, there are other entities and attributes that relate identifiable objects in the text and can provide valuable additional information. In the field of epizootics, these include specific properties of diseases such as their name, location, species affected, and current confirmation status. These are important for compiling the spatial and temporal statistics and other information needed to track diseases, leading to applications such as detection and prevention of bioterrorism. Toward this objective, we present a system (Rule Based Event Extraction System in Epizootic Domains) that can be used for extracting the infectious disease outbreaks from the unstructured data automatically by using the concept of pattern matching. In addition to extracting events, the components of this system can help provide structured and summarized data that can be used to differentiate confirmed events from suspected events, answer questions regarding when and where the disease was prevalent develop a model for predicting future disease outbreaks, and support visualization using interfaces such as Google Maps. While developing this system, we consider the research issues that include document relevance classification, entity extraction, recognizing the outbreak events in the disease domain and to support the visualization for events. We present a sentence-based event extraction approach for extracting the outbreak events from epizootic domain that has tasks such as extracting the events such as the disease name, location, species, confirmation status, and date; classifying the events into two categories of confirmation status- confirmed or suspected. The present approach shows how confirmation status is important in extracting the disease based events from unstructured data and a pyramid approach using reference summaries is used for evaluating the extracted events.
APA, Harvard, Vancouver, ISO, and other styles
14

Satish, Sneha. "A Mechanism Design Approach for Mining 3-clusters across Datasets from Multiple Domains." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1471345904.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Ellis, James E. "Data visualization of ISR and C2 assets across multiple domains for battlespace awareness." Thesis, Monterey, California. Naval Postgraduate School, 2009. http://hdl.handle.net/10945/4630.

Full text
Abstract:
Approved for public release, distribution unlimited
In this thesis, we have developed a prototype application that is capable of providing ISR situational awareness to C2 nodes at the Joint Task Force (JTF) level and below. The prototype application is also capable of providing information that will allow joint intelligence planners to plan ISR operations more efficiently, including allocation of intelligence-gathering platforms and sensors, and processing, exploitation, and dissemination (PED) assets to information requests.
APA, Harvard, Vancouver, ISO, and other styles
16

Boulos, Rasha. "Human genome segmentation into structural domains : from chromatin conformation data to nuclear functions." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1024/document.

Full text
Abstract:
Le programme de réplication d’environ la moitié du génome des mammifères est caractérisé par des U/N-domaines de réplication de l’ordre du méga-base en taille. Ces domaines sont bordés par des origines de réplication maitresses (MaOris) correspondantes à des régions (~200 kb) de chromatine ouverte favorables à l’initiation précoce de la réplication et de la transcription. Grâce au développement récent de technologies à haut débit de capture de conformations des chromosomes (Hi-C), des matrices de fréquences de co-localisation 3D entre toutes les paires de loci sont désormais déterminées expérimentalement. Il est apparu que les U/N-domaines sont reliés à l’organisation du génome en unités structurelles. Dans cette thèse, nous avons effectué une analyse combinée de données de Hi-C de lignées cellulaires humaines et de profils de temps de réplication pour explorer davantage les relations structure/fonction dans le noyau. Cela nous a conduit à décrire de nouveaux domaines de réplication de grande tailles (>3 Mb) : les split-U-domaines aussi bordés par des MaOris; à démontrer que la vague de réplication initiée aux MaOris ne dépend que du temps pendant la phase S et de montrer que le repliement de la chromatine est compatible avec un modèle d’équilibre 3D pour les régions euchromatiniennes à réplication précoces et un modèle d’équilibre 2D pour les régions heterochromatiniennes à réplication tardives associées à la lamina nucléaire. En représentant les matrices de co-localisation issues du Hi-C en réseaux d’interactions structurelles et en déployant des outils de la théorie des graphes, nous avons aussi démontré que les MaOris sont des hubs interconnectés à longue portée dans le réseau structurel, fondamentaux pour l’organisation 3D du génome et nous avons développé une méthodologie multi-échelle basée sur les ondelettes sur graphes pour délimiter objectivement des unités structurelles à partir des données Hi-C. Ce travail nous permet de discuter de la relation entre les domaines de réplication et les unités structurelles entre les différentes lignées cellulaires humaines
The replication program of about one half of mammalian genomes is characterized by megabase-sized replication U/N-domains. These domains are bordered by master replication origins (MaOris) corresponding to ~200 kb regions of open chromatin favorable for early initiation of replication and transcription. Thanks to recent high-throughput chromosome conformation capture technologies (Hi-C), 3D co-localization frequency matrices between all genome loci are now experimentally determined. It appeared that U/N-domains were related to the organization of the genome into structural units. In this thesis, we performed a combined analysis of human Hi-C data and replication timing profiles to further explore the structure/function relationships in the nucleus. This led us to describe novel large (>3 Mb) replication timing split-U domains also bordered by MaOris, to demonstrate that the replication wave initiated at MaOris only depends of the time during S phase and to show that chromatin folding is compatible with a 3D equilibrium in early-replicating euchromatin regions turning to a 2D equilibrium in the late-replicating heterochromatin regions associated to nuclear lamina. Representing Hi-C co-localization matrices as structural networks and deploying graph theoretical tools, we also demonstrated that MaOris are long-range interconnected hubs in the structural network, central to the 3D organization of the genome and we developed a novel multi-scale methodology based on graph wavelets to objectively delineate structural units from Hi-C data. This work allows us to discuss the relationship between replication domains and structural units across different human cell lines
APA, Harvard, Vancouver, ISO, and other styles
17

Real, Brian T. Ellis James E. "Data visualization of ISR and C2 assets across multiple domains for battlespace awareness." Monterey, California : Naval Postgraduate School, 2009. http://edocs.nps.edu/npspubs/scholarly/theses/2009/Sep/09Sep%5FReal.pdf.

Full text
Abstract:
Thesis (M.S. in Information Technology Management)--Naval Postgraduate School, September 2009.
Thesis Advisor(s): Osmundson, John ; Dolk, Daniel. "September 2009." Description based on title screen as viewed on November 5, 2009. Author(s) subject terms: ISR Situational Awareness, Information Request. Includes bibliographical references (p. 65-66). Also available in print.
APA, Harvard, Vancouver, ISO, and other styles
18

De, Lazzari Eleonora. "Gene families distributions across bacterial genomes : from models to evolutionary genomics data." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066406/document.

Full text
Abstract:
La génomique comparative est un sujet essentiel pour éclaircir la biologie évolutionnaire. La première étape pour dépasser une connaissance seulement descriptive est de développer une méthode pour représenter le contenu du génome. Nous avons choisi la représentation modulaire des génomes pour étudier les lois quantitatives qui réglementent leur composition en unités élémentaires de type fonctionnel ou évolutif. La première partie de la thèse se fonde sur l'observation que le nombre de domaines ayant la même fonction est lié à la taille du génome par une loi de puissance. Puisque les catégories fonctionnelles sont des agrégats de familles de domaines, on se demande comment le nombre de domaines dans la même catégorie fonctionnelle est lié à l'évolution des familles. Le résultat est que les familles suivent également une loi de puissance. Le deuxième partie présente un modèle positif qui construit une réalisation à partir des composants liés dans un réseau de dépendance. L'ensemble de toutes les réalisations reproduit la distribution des composants partagés et la relation entre le nombre de familles distinctes et la taille du génome. Le dernier chapitre étend l'approche modulaire aux écosystèmes microbiens. Sur la base des constatations que nous avons faites sur les lois de puissance pour les familles de domaines, nous avons analysé comment le nombre de familles dans un metagénome en est influencé. Par conséquence, nous avons défini une nouvelle observable dont la forme fonctionnelle comprend des informations quantitatives sur la composition originelle du metagénome
Comparative genomics is as a fundamental discipline to unravel evolutionary biology. To overcome a mere descriptive knowledge of it the first challenge is to develop a higher-level description of the content of a genome. Therefore we used the modular representation of genomes to explore quantitative laws that regulate how genomes are built from elementary functional and evolutionary ingredients. The first part sets off from the observation that the number of domains sharing the same function increases as a power law of the genome size. Since functional categories are aggregates of domain families, we asked how the abundance of domains performing a specific function emerges from evolutionary moves at the family level. We found that domain families are also characterized by family-dependent scaling laws. The second chapter provides a theoretical framework for the emergence of shared components from dependency in empirical component systems with non-binary abundances. We defined a positive model that builds a realization from a set of components linked in a dependency network. The ensemble of resulting realizations reproduces both the distribution of shared components and the law for the growth of the number of distinct families with genome size. The last chapter extends the component systems approach to microbial ecosystems. Using our findings about families scaling laws, we analyzed how the abundance of domain families in a metagenome is affected by the constraint of power-law scaling of family abundance in individual genomes. The result is the definition of an observable, whose functional form contains quantitative information on the original composition of the metagenome
APA, Harvard, Vancouver, ISO, and other styles
19

Trinity, Luke. "Complex systems analysis in selected domains: animal biosecurity & genetic expression." ScholarWorks @ UVM, 2020. https://scholarworks.uvm.edu/graddis/1190.

Full text
Abstract:
I first broadly define the study of complex systems, identifying language to describe and characterize mechanisms of such systems which is applicable across disciplines. An overview of methods is provided, including the description of a software development methodology which defines how a combination of computer science, statistics, and mathematics are applied to specified domains. This work describes strategies to facilitate timely completion of robust and adaptable projects which vary in complexity and scope. A biosecurity informatics pipeline is outlined, which is an abstraction useful in organizing the analysis of biological data from cells. This is followed by specific applications of complex systems study to the fields of animal biosecurity and genetic expression. I provide evidence that social cues need to be considered by livestock facility managers in order to increase disease-resiliency of agricultural systems. I also identify significant changes in genetic expression from recent experiments which are advancing the frontiers of regenerative medicine. Areas of future work are discussed including issues related to agriculture and water quality, as well as studies of human behavior and risk perception using experimental gaming simulations.
APA, Harvard, Vancouver, ISO, and other styles
20

Hare, Matthew Peter. "Weaver - a hybrid artificial intelligence laboratory for modelling complex, knowledge- and data-poor domains." Thesis, University of Aberdeen, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.287609.

Full text
Abstract:
Weaver is a hybrid knowledge discovery environment which fills a current gap in Artificial Intelligence (AI) applications, namely tools designed for the development and exploration of existing knowledge in complex, knowledge and data-poor domains. Such domains are typified by incomplete and conflicting knowledge, and data which are very hard to collect. Without the support of robust domain theory, many experimental and modelling assumptions have to be made whose impact on field work and model design are uncertain or simply unknown. Compositional modelling, experimental simulation, inductive learning, and experimental reformulation tools are integrated within a methodology analogous to Popper's scientific method of critical discussion. The purpose of Weaver is to provide a 'laboratory' environment in which a scientist can develop domain theory through an iterative process of in silico experimentation, theory proposal, criticism, and theory refinement. After refinement within Weaver, this domain theory may be used to guide field work and model design. Weaver is a pragmatic response to tool development in complex, knowledge- and data- poor domains. In the compositional modelling tool, a domain-independent algorithm for dynamic multiple scale bridging has been developed. The multiple perspective simulation tool provides an object class library for the construction of multiple simulations that can be flexibly and easily altered. The experimental reformulator uses a simple domain-independent heuristic search to help guide the scientist in selecting the experimental simulations that need to be carried out in order to critically test and refine the domain theory. An example of Weaver's use in an ecological domain is provided in the exploration of the possible causes of population cycles in red grouse (Lagopus, lagopus scoticus). The problem of AI tool validation in complex, knowledge- and data-poor domains is also discussed.
APA, Harvard, Vancouver, ISO, and other styles
21

Zhao, Hong. "A Bayesian Analysis of BMI Data of Children from Small Domains: Adjustment for Nonresponse." Digital WPI, 2006. https://digitalcommons.wpi.edu/etd-theses/1138.

Full text
Abstract:
"We analyze data on body mass index (BMI) in the third National Health and Nutrition Examination survey, predict finite population BMI stratified by different domains of race, sex and family income, and investigate what adjustment needed for nonresponse mechanism. We built two types of models to analyze the data. In the ignorable nonresponse models, each model is within the hierarchical Bayesian framework. For Model 1, BMI is only related to age. For Model 2, the linear regression is height on weight, and weight on age. The parameters, nonresponse and the nonsampled BMI values are generated from each model. We mainly use the composition method to obtain samples for Model 1, and Gibbs sampler to generate samples for Model 2. We also built two nonignorable nonresponse models corresponding to the ignorable nonresponse models. Our nonignorable nonresponse models have one important feature: the response indicators are not related to BMI and neither weight nor height, but we use the same parameters corresponding to the ignorable nonresponse models. We use sample important resampling (SIR) algorithm to generate parameters and nonresponse, nonsample values. Our results show that the ignorable nonresponse Model 2 (modeling height and weight) is more reliable than Model 1 (modeling BMI), since the predicted finite population mean BMI of Model 1 changes very little with age. The predicted finite population mean of BMI is affected by different domain of race, sex and family income. Our results also show that the nonignorable nonresponse models infer smaller standard deviation of regression coefficients and population BMI than in the ignorable nonresponse models. It is due to the fact that we are incorporating information from the response indicators, and there are no additional parameters. Therefore, the nonignorable nonresponse models allow wider inference."
APA, Harvard, Vancouver, ISO, and other styles
22

Cruz, Ethan E. "Coupled inviscid-viscous solution methodology for bounded domains: Application to data center thermal management." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54316.

Full text
Abstract:
Computational fluid dynamics and heat transfer (CFD/HT) models have been employed as the dominant technique for the design and optimization of both new and existing data centers. Inviscid modeling has shown great speed advantages over the full Navier-Stokes CFD/HT models (over 20 times faster), but is incapable of capturing the physics in the viscous regions of the domain. A coupled inviscid-viscous solution method (CIVSM) for bounded domains has been developed in order to increase both the solution speed and accuracy of CFD/HT models. The methodology consists of an iterative solution technique that divides the full domain into multiple regions consisting of at least one set of viscous, inviscid, and interface regions. The full steady, Reynolds-Averaged Navier-Stokes (RANS) equations with turbulence modeling are used to solve the viscous domain, while the inviscid domain is solved using the Euler equations. By combining the increased speed of the inviscid solver in the inviscid regions, along with the viscous solver’s ability to capture the turbulent flow physics in the viscous regions, a faster and potentially more accurate solution can be obtained for bounded domains that contain inviscid regions which encompass more than half of the domain, such as data centers.
APA, Harvard, Vancouver, ISO, and other styles
23

Chen, Cheng. "Inter-gestural Coordination in Temporal and Spatial Domains in Italian: Synchronous EPG + UTI Data." Doctoral thesis, Scuola Normale Superiore, 2019. http://hdl.handle.net/11384/86022.

Full text
Abstract:
This dissertation explores the temporal coordination of articulatory gestures in various segmental conditions in Italian, by comparing onset and coda singletons as well as word-final and intervocalic consonant clusters in a Tuscan variety of Italian. Articulatory models of syllable structure assume that the coordination between the vocalic gesture and the consonantal gesture may differ in onset vs. coda and in singletons vs. clusters. Based on previous literature on different languages, we expect to find differences in the temporal coordination of singletons and clusters in Italian too. In addition, recent literature suggests that the articulatory and coarticulatory properties of the segments play an important role in determining the details of the coordination patterns, and that not all segments or segmental sequences behave in the same way as far as their gestural coordination relations are concerned. Thus, an additional aim of this work is to compare consonants with different coarticulatory properties (in the sense of modifications of C articulation in varying vocalic contexts) and seek for possible relations between coarticulation and coordination patterns. The methodology used is new. We used an original system for the acquisition, realtime synchronization and analysis of acoustic, electropalatographic (EPG) and ultrasound tongue imaging (UTI) data, called SynchroLing. EPG and UTI instrumental techniques provide complementary information on, respectively, linguo-palatal contact patterns in the anterior vocal tract and midsagittal profiles of the whole tongue, including postdorsum and root. SynchroLing allows real-time inspection of contacts in the artificial palate and tongue midsagittal movements, coupled with acoustics. [...]
APA, Harvard, Vancouver, ISO, and other styles
24

CANNAS, LAURA MARIA. "A framework for feature selection in high-dimensional domains." Doctoral thesis, Università degli Studi di Cagliari, 2013. http://hdl.handle.net/11584/266105.

Full text
Abstract:
The introduction of DNA microarray technology has lead to enormous impact in cancer research, allowing researchers to analyze expression of thousands of genes in concert and relate gene expression patterns to clinical phenotypes. At the same time, machine learning methods have become one of the dominant approaches in an effort to identify cancer gene signatures, which could increase the accuracy of cancer diagnosis and prognosis. The central challenges is to identify the group of features (i.e. the biomarker) which take part in the same biological process or are regulated by the same mechanism, while minimizing the biomarker size, as it is known that few gene expression signatures are most accurate for phenotype discrimination. To account for these competing concerns, previous studies have proposed different methods for selecting a single subset of features that can be used as an accurate biomarker, capable of differentiating cancer from normal tissues, predicting outcome, detecting recurrence, and monitoring response to cancer treatment. The aim of this thesis is to propose a novel approach that pursues the concept of finding many potential predictive biomarkers. It is motivated from the biological assumption that, given the large numbers of different relationships which are possible between genes, it is highly possible to combine genes in many ways to produce signatures with similar predictive power. An intriguing advantage of our approach is that it increases the statistical power to capture more reliable and consistent biomarkers while a single predictor may not necessarily provide important clues as to biological differences of interest. Specifically, this thesis presents a framework for feature selection that is based upon a genetic algorithm, a well known approach recently proposed for feature selection. To mitigate the high computationally cost usually required by this algorithm, the framework structures the feature selection process into a multi-step approach which combines different categories of data mining methods. Starting from a ranking process performed at the first step, the following steps detail a wrapper approach where a genetic algorithm is coupled with a classifier to explore different feature subspaces looking for optimal biomarkers. The thesis presents in detail the framework and its validation on popular datasets which are usually considered as benchmark by the research community. The competitive classification power of the framework has been carefully evaluated and empirically confirms the benefits of its adoption. As well, experimental results obtained by the proposed framework are comparable to those obtained by analogous literature proposals. Finally, the thesis contributes with additional experiments which confirm the framework applicability to the categorization of the subject matter of documents.
APA, Harvard, Vancouver, ISO, and other styles
25

Holupirek, Alexander [Verfasser]. "Declarative Access to Filesystem Data : New application domains for XML database management systems / Alexander Holupirek." Konstanz : Bibliothek der Universität Konstanz, 2012. http://d-nb.info/102684715X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Alsayat, Ahmed Mosa. "Efficient genetic k-means clustering algorithm and its application to data mining on different domains." Thesis, Bowie State University, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10239708.

Full text
Abstract:

Because of the massive increase for streams available and being produced, the areas of data mining and machine learning have become increasingly popular. This takes place as companies, organizations and industries seek out optimal methods and techniques for processing these large data sets. Machine learning is a branch of artificial intelligence that involves creating programs that autonomously perform different data mining techniques when exposed to data streams. The study evaluates at two very different domains in an effort to provide a better and more optimized applicable method of clustering than is currently being used. We examine the use of data mining in healthcare, as well as the use of these techniques in the social media domain. Testing the proposed technique on these two drastically different domains offers us valuable insights into the performance of the proposed technique across domains.

This study aims at reviewing the existing methods of clustering and presenting an enhanced k-means clustering algorithm by using a novel method called Optimize Cluster Distance (OCD) applied to social media domain. This (OCD) method maximizes the distance between clusters by pair-wise re-clustering to enhance the quality of the clusters. For the healthcare domain, the k-means was applied along with Self Organizing Map (SOM) to get an optimal number of clusters. The possibility of getting bad positions of centroids in k-means was solved by applying the Genetic algorithm to the k-means in social media and healthcare domains. The OCD was applied again to enhance the quality of the produced clusters. In both domains, compared to the conventional k-means, the analysis shows that the proposed k-means is accurate and achieves better clustering performance along with valuable insights for each cluster. The approach is unsupervised, scalable and can be applied to various domains.

APA, Harvard, Vancouver, ISO, and other styles
27

Reinartz, Thomas [Verfasser]. "Focusing solutions for data mining : analytical studies and experimental results in real world domains / T. Reinartz." Berlin, 1999. http://d-nb.info/965635090/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

BERRETTA, SERENA. "A Real-Time Adaptive Sampling Strategy Optimized by Uncertainty for Spatial Data Analysis on Complex Domains." Doctoral thesis, Università degli studi di Genova, 2022. https://hdl.handle.net/11567/1099778.

Full text
Abstract:
Environmental monitoring is used to reveal the state of the environment, to inform experts and help them to prioritise actions in the context of environmental policies. Environmental sampling is the way the environment is interrogated to get measures of environmental (e.g., physical, chemical) parameters in a limited set of locations (samples). The environmental properties varies from place to place in continuum and there are infinitely many places at which we might record what they are like, but practically we can measure them at only a finite number by sampling. The role of the location in which samples are collected is very crucial. The focus of the thesis is the study of a mathematical framework that supports a reasoned and non-random sampling of environmental variables, with the aim of defining a methodological approach to optimise the number of sampling required while maintaining a target precision. The arrangement of points is not selected or known a priori; conversely, we propose an iterative process where the next-sample location is determined on-the-fly on the basis of the environmental scenario that is delineated more and more accurately at each iteration. At each iteration, the distribution map is updated with the new incoming data. The geostatistical analysis we implement provides a predicted value and the related uncertainty about that value, actually providing an uncertainty map beside the predicted distribution. The system responds to the current state by requiring a measurement in the area with highest uncertainty, to reduce uncertainty and increase accuracy. Environmental survey areas to monitor are often characterised by very complex boundaries. Unstructured grids are more flexible to faithfully represent complex geometries compared to structured grids. The usage of unstructured grids introduces another innovation aspect studied in the thesis, which is the change of support model.
APA, Harvard, Vancouver, ISO, and other styles
29

Sofman, Boris. "Online Learning Techniques for Improving Robot Navigation in Unfamiliar Domains." Research Showcase @ CMU, 2010. http://repository.cmu.edu/dissertations/43.

Full text
Abstract:
Many mobile robot applications require robots to act safely and intelligently in complex unfamiliarenvironments with little structure and limited or unavailable human supervision. As arobot is forced to operate in an environment that it was not engineered or trained for, various aspectsof its performance will inevitably degrade. Roboticists equip robots with powerful sensorsand data sources to deal with uncertainty, only to discover that the robots are able to make onlyminimal use of this data and still find themselves in trouble. Similarly, roboticists develop andtrain their robots in representative areas, only to discover that they encounter new situations thatare not in their experience base. Small problems resulting in mildly sub-optimal performance areoften tolerable, but major failures resulting in vehicle loss or compromised human safety are not.This thesis presents a series of online algorithms to enable a mobile robot to better deal withuncertainty in unfamiliar domains in order to improve its navigational abilities, better utilizeavailable data and resources and reduce risk to the vehicle. We validate these algorithms throughextensive testing onboard large mobile robot systems and argue how such approaches can increasethe reliability and robustness of mobile robots, bringing them closer to the capabilitiesrequired for many real-world applications.
APA, Harvard, Vancouver, ISO, and other styles
30

Ghanem, Amal Saleh. "Probabilistic models for mining imbalanced relational data." Thesis, Curtin University, 2009. http://hdl.handle.net/20.500.11937/2266.

Full text
Abstract:
Most data mining and pattern recognition techniques are designed for learning from at data files with the assumption of equal populations per class. However, most real-world data are stored as rich relational databases that generally have imbalanced class distribution. For such domains, a rich relational technique is required to accurately model the different objects and relationships in the domain, which can not be easily represented as a set of simple attributes, and at the same time handle the imbalanced class problem.Motivated by the significance of mining imbalanced relational databases that represent the majority of real-world data, learning techniques for mining imbalanced relational domains are investigated. In this thesis, the employment of probabilistic models in mining relational databases is explored. In particular, the Probabilistic Relational Models (PRMs) that were proposed as an extension of the attribute-based Bayesian Networks. The effectiveness of PRMs in mining real-world databases was explored by learning PRMs from a real-world university relational database. A visual data mining tool is also proposed to aid the interpretation of the outcomes of the PRM learned models.Despite the effectiveness of PRMs in relational learning, the performance of PRMs as predictive models is significantly hindered by the imbalanced class problem. This is due to the fact that PRMs share the assumption common to other learning techniques of relatively balanced class distributions in the training data. Therefore, this thesis proposes a number of models utilizing the effectiveness of PRMs in relational learning and extending it for mining imbalanced relational domains.The first model introduced in this thesis examines the problem of mining imbalanced relational domains for a single two-class attribute. The model is proposed by enriching the PRM learning with the ensemble learning technique. The premise behind this model is that an ensemble of models would attain better performance than a single model, as misclassification committed by one of the models can be often correctly classified by others.Based on this approach, another model is introduced to address the problem of mining multiple imbalanced attributes, in which it is important to predict several attributes rather than a single one. In this model, the ensemble bagging sampling approach is exploited to attain a single model for mining several attributes. Finally, the thesis outlines the problem of imbalanced multi-class classification and introduces a generalized framework to handle this problem for both relational and non-relational domains.
APA, Harvard, Vancouver, ISO, and other styles
31

Bussoli, Ilaria. "Heterogeneous Graphical Models with Applications to Omics Data." Doctoral thesis, Università degli studi di Padova, 2019. http://hdl.handle.net/11577/3423293.

Full text
Abstract:
Thanks to the advances in bioinformatics and high-throughput methodologies of the last decades, a large unprecedented amount of biological data coming from various experiments in metabolomics, genomics and proteomics is available. This has lead the researchers to conduct more and more comprehensive molecular proling of biological samples through different multiple aspects of genomic activities, thus introducing new challenges in the developments of statistical tools to integrate and model multi-omics data. The main research objective of this thesis is to develop a statistical framework for modelling the interactions between genes when their activity is measured on different domains; to do so, our approach relies on the concept of multilayer network, and how structures of this type can be combined with graphical models for mixed data, i.e., data comprising variables of different nature (e.g., continuous, categorical, skewed, to name a few). We further develop an algorithm for learning the structure of the undirected multilayer networks underlying the proposed models, showing its promising results through empirical analyses on cancer data, which was downloaded from the public TCGA consortium.
APA, Harvard, Vancouver, ISO, and other styles
32

Scheidker, E. J., R. D. Pendley, R. M. Rashkin, R. D. Weking, B. G. Cruse, and M. A. Bracken. "IMACCS: A Progress Report on NASA/GSFC's COTS-Based Ground Data Systems, and Their Extension into New Domains." International Foundation for Telemetering, 1996. http://hdl.handle.net/10150/611446.

Full text
Abstract:
International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California
The Integrated Monitoring, Analysis, and Control COTS System (IMACCS), a system providing real time satellite command and telemetry support, orbit and attitude determination, events prediction, and data trending, was implemented in 90 days at NASA Goddard Space Flight Center (GSFC) in 1995. This paper describes upgrades made to the original commercial, off-the-shelf (COTS)-based prototype. These upgrades include automation capability and spacecraft Integration and Testing (I&T) capability. A further extension to the prototype is the establishment of a direct RF interface to a spacecraft. As with the original prototype, all of these enhancements required lower staffing levels and reduced schedules compared to custom system development approaches. The team's approach to system development, including taking advantage of COTS and legacy software, is also described.
APA, Harvard, Vancouver, ISO, and other styles
33

Thaher, Mohammed. "Efficient Algorithms for the Maximum Convex Sum Problem." Thesis, University of Canterbury. Computer Science and Software Engineering, 2009. http://hdl.handle.net/10092/2102.

Full text
Abstract:
The work of this thesis covers the Maximum Subarray Problem (MSP) from a new perspective. Research done previously and current methods of finding MSP include using the rectangular shape for finding the maximum sum or gain. The rectangular shape region used previously is not flexible enough to cover various data distributions. This research suggested using the convex shape, which is expected to have optimised and efficient results. The steps to build towards using the proposed convex shape in the context of MSP are as follows: studying the available research in-depth to extract the potential guidelines for this thesis research; implementing an appropriate convex shape algorithm; generalising the main algorithm (based on dynamic programming) to find the second maximum sum, the third maximum sum and up to Kth maximum sum; and finally conducting experiments to evaluate the outcomes of the algorithms in terms of the maximum gain, time complexity, and the running time. In this research, the following findings were achieved: one of the achievements is presenting an efficient algorithm, which determines the boundaries of the convex shape while having the same time complexity as other existing algorithms (the prefix sum was used to speed up the convex shape algorithm in finding the maximum sum). Besides the first achievement, the algorithm was generalized to find up to the Kth maximum sum. Finding the Kth maximum convex sum was shown to be useful in many applications, one of these (based on a study with the cooperation of Christchurch Hospital in New Zealand) is accurately and efficiently locating brain tumours. Beside this application, the research findings present new approaches to applying MSP algorithms in real life applications, such as data mining, computer vision, astronomy, economics, chemistry, and medicine.
APA, Harvard, Vancouver, ISO, and other styles
34

Wolman, Stacey D. "The effects of biographical data on the prediction of domain knowledge." Thesis, Available online, Georgia Institute of Technology, 2005, 2005. http://etd.gatech.edu/theses/available/etd-08022005-140654/.

Full text
Abstract:
Thesis (M. S.)--Psychology, Georgia Institute of Technology, 2006.
Dr. Phillip L. Ackerman, Committee Chair ; Dr. Ruth Kanfer, Committee Co-Chair ; Dr. Lawrence James, Committee Member. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
35

Fountalis, Ilias. "From spatio-temporal data to a weighted and lagged network between functional domains: Applications in climate and neuroscience." Diss., Georgia Institute of Technology, 2016. http://hdl.handle.net/1853/55008.

Full text
Abstract:
Spatio-temporal data have become increasingly prevalent and important for both science and enterprises. Such data are typically embedded in a grid with a resolution larger than the true dimensionality of the underlying system. One major task is to identify the distinct semi-autonomous functional components of the spatio-temporal system and to infer their interconnections. In this thesis, we propose two methods that identify the functional components of a spatio-temporal system. Next, an edge inference process identifies the possibly lagged and weighted connections between the system’s components. The weight of an edge accounts for the magnitude of the interaction between two components; the lag associated with each edge accounts for the temporal ordering of these interactions. The first method, geo-Cluster, infers the spatial components as “areas”; spatially contiguous, non-overlapping, sets of grid cells satisfying a homogeneity constraint in terms of their average pair-wise cross-correlation. However, in real physical systems the underlying physical components might overlap. To this end we also propose δ-MAPS, a method that first identifies the epicenters of activity of the functional components of the system and then creates domains – spatially contiguous, possibly overlapping, sets of grid cells that satisfy the same homogeneity constraint. The proposed framework is applied in climate science and neuroscience. We show how these methods can be used to evaluate cutting edge climate models and identify lagged relationships between different climate regions. In the context of neuroscience, the method successfully identifies well-known “resting state networks” as well as a few areas forming the backbone of the functional cortical network. Finally, we contrast the proposed methods to dimensionality reduction techniques (e.g., clustering PCA/ICA) and show their limitations.
APA, Harvard, Vancouver, ISO, and other styles
36

Exibard, Léo. "Automatic synthesis of systems with data." Electronic Thesis or Diss., Aix-Marseille, 2021. http://www.theses.fr/2021AIXM0312.

Full text
Abstract:
Nous interagissons régulièrement avec des machines qui réagissent en temps réel à nos actions (robots, sites web etc). Celles-ci sont modélisées par des systèmes réactifs, caractérisés par une interaction constante avec leur environnement. L'objectif de la synthèse réactive est de générer automatiquement un tel système à partir de la description de son comportement afin de remplacer la phase de développement bas-niveau, sujette aux erreurs, par l'élaboration d'une spécification haut-niveau.Classiquement, on suppose que les signaux d'entrée de la machine sont en nombre fini. Un tel cadre échoue à modéliser les systèmes qui traitent des données issues d'un ensemble infini (un identifiant unique, la valeur d'un capteur, etc). Cette thèse se propose d'étendre la synthèse réactive au cas des mots de données. Nous étudions un modèle adapté à ce cadre plus général, et examinons la faisabilité des problèmes de synthèse associés. Nous explorons également les systèmes non réactifs, où l'on n'impose pas à la machine de réagir en temps réel
We often interact with machines that react in real time to our actions (robots, websites etc). They are modelled as reactive systems, that continuously interact with their environment. The goal of reactive synthesis is to automatically generate a system from the specification of its behaviour so as to replace the error-prone low-level development phase by a high-level specification design.In the classical setting, the set of signals available to the machine is assumed to be finite. However, this assumption is not realistic to model systems which process data from a possibly infinite set (e.g. a client id, a sensor value, etc.). The goal of this thesis is to extend reactive synthesis to the case of data words. We study a model that is well-suited for this more general setting, and examine the feasibility of its synthesis problem(s). We also explore the case of non-reactive systems, where the machine does not have to react immediately to its inputs
APA, Harvard, Vancouver, ISO, and other styles
37

Dhondge, Hrishikesh. "Structural characterization of RNA binding to RNA recognition motif (RRM) domains using data integration, 3D modeling and molecular dynamic simulation." Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0103.

Full text
Abstract:
Cette thèse a été réalisée dans le cadre d'un projet Européen plus vaste (ITN RNAct) dans lequel des approches informatiques et biologiques étaient combinées pour progresser vers la synthèse de nouveaux domaines protéiques capables de se fixer sur des séquences spécifiques d'ARN. L'objectif spécifique de cette thèse était de concevoir et développer des outils informatiques pour mieux exploiter les connaissances existantes sur les domaines à Motif de Reconnaissance de l'ARN (RRM) lors de la modélisation 3D des complexes RRM-ARN. Les domaines RRMs représentent 50% de toutes les protéines fixant l'ARN et sont trouvées dans environ 2% de toutes les régions codantes du génome humain. Cependant, du fait de la grande diversité des domaines RRMs, il n'y a eu jusqu'à présent que très peu de succès rapportés dans la conception de nouveaux domaines RRMs. La contribution centrale de cette thèse est la construction d'une base de données relationnelle appelée (InteR3M) qui intègre des informations de séquence, de structure et de fonction sur les domaines RRMs. La base de données InteR3M (href{https://inter3mdb.loria.fr/}{https://inter3mdb.loria.fr/}) contient 400,892 instances de domaines RRM (dérivées d'entrées UniProt) et 1,456 structures 3D déterminées expérimentalement (dérivées d'entrées PDB), qui correspondent à seulement 303 instances distinctes de domaines RRM. De plus, InteR3M contient 459,859 interactions atomiques entre RRM et acides nucléiques, dérivées de 656 structures 3D dans lesquelles le domaine RRM forme un complexe avec un ARN ou un ADN. Au cours du processus de collecte de données, des incohérences ont été détectées dans la classification de plusieurs instances de domaines RRMs dans les bases de données de domaines protéiques populaires CATH et Pfam. Ceci m'a conduit à proposer une approche originale (CroMaSt) pour résoudre ce problème, à partir de la mise en correspondance des instances structurales de domaines RRMs entre ces deux bases de données et de l'alignement structural des domaines sans correspondance avec une structure prototype du domaine RRM. Le workflow CroMast est disponible sur le Workflow Hub Européen (href{https://workflowhub.eu/workflows/390}{https://workflowhub.eu/workflows/390}). Les informations de séquence et de structure intégrées dans la base de données InteR3M ont ensuite été utilisées pour aligner entre eux tous les domaines RRM et cartographier toutes les interactions RRM-ARN sur cet alignement en vue d'identifier les différents modes de liaison de l'ARN aux domaines RRM. Ceci a conduit au développement, avec nos partenaires RNAct de VUB (Vrije Universiteit Brussel), de l'outil `RRMScorer'. Cet outil contribue au déchiffrage du code de reconnaissance RRM-ARN en calculant les probabilités de liaison entre les nucléotides de l'ARN et les acides aminés des domaines RRM à certaines positions de l'alignement. Les contacts atomiques entre RRMs et ARN ont aussi été utilisés pour identifier des motifs d'ancrage, c'est-à-dire des prototypes des positions 3D atomiques (relatives au squelette protéique) d'un nucléotide interagissant par empilement (`stacking') avec un acide aminé aromatique conservé. Ces ancres peuvent être utilisées comme des contraintes dans un protocole d'amarrage ancré (`anchored docking'). Le pipeline `RRM-RNA dock' est présenté ici et il intègre à la fois les motifs d'ancrage extraits de la base de données InteR3M et les scores de liaison de RRMScorer. Finalement, la simulation en dynamique moléculaire (MD) est un autre outil informatique testé dans cette thèse pour contribuer à la modélisation 3D des complexes RRM-ARN. Des protocoles MD préliminaires mais prometteurs sont décrits au titre d'essais visant à distinguer entre les complexes RRM-ARN à liaison forte ou faible
This thesis was carried out in the frame of a larger European project (ITN RNAct) in which computer science and biology approaches were combined to make progress towards the synthesis of new protein domains able to bind to specific RNA sequences. The specific goal of this thesis was to design and develop computational tools to better exploit existing knowledge on RNA Recognition Motif (RRM) domains using 3D modeling of RRM-RNA complexes. RRMs account for 50% of all RNA binding proteins and are present in about 2% of the protein-coding regions of the human genome. However, due to the large diversity of RRMs, there have been very few successful examples of new RRM design so far. A central achievement of this thesis is the construction of a relational database called `InteR3M' that integrates sequence, structural and functional information about RRM domains. InteR3M database (href{https://inter3mdb.loria.fr/}{https://inter3mdb.loria.fr/}) contains 400,892 RRM domain instances (derived from UniProt entries) and 1,456 experimentally solved 3D structure (derived from PDB entries) corresponding to only 303 distinct RRM instances. In addition, InteR3M stores 459,859 atom-atom interactions between RRM and nucleic acids, retrieved from 656 3D structures in which the RRM domain is complexed with RNA or DNA. During the data collection procedure, inconsistencies were detected in the classification of several RRM instances in the popular domain databases CATH and Pfam. This led me to propose an original approach (CroMaSt) to solve this issue, based on cross-mapping of structural instances of RRMs between these two domain databases and on the structural alignment of unmapped instances with an RRM structural prototype. The CroMaSt CWL workflow is available on the European Workflow hub at href{https://workflowhub.eu/workflows/390}{https://workflowhub.eu/workflows/390}. Sequence and structural information stored in InteR3M database was then used to align RRM domains and map all RRM-RNA interactions onto this alignment to identify the different binding modes of RNA to RRM domains. This led to the development, with RNAct partners at VUB (Vrije Universiteit Brussel), of the `RRMScorer' tool. This tool contributes to decipher the RRM-RNA code by computing binding probabilities between RNA nucleotides and RRM amino acids at certain positions of the alignment. Atomic contacts between RRMs and RNA were also used to identify anchoring patterns, i.e. prototypes of 3D atomic positions (relative to the protein backbone) of a nucleotide stacked on a conserved aromatic amino acid. These anchors can be used as constraints in anchored docking protocols. The `RRM-RNA dock' docking pipeline is presented here and integrates both anchoring patterns extracted from InteR3M and binding scores from RRMScorer. Finally, molecular dynamic (MD) simulation is another computational tool tested in this thesis to contribute to the 3D modeling of RRM-RNA complexes. Promising preliminary MD protocols are described as attempts to distinguish between strongly and weakly binding RRM-RNA complexes
APA, Harvard, Vancouver, ISO, and other styles
38

Heilmann, Zeno. "CRS-stack-based seismic reflection imaging for land data in time and depth domains CRS-Stapelungsbasierte Zeit- und Tiefenbereichsabbildung reflexionsseismischer Landdaten /." [S.l. : s.n.], 2007. http://swbplus.bsz-bw.de/bsz262418770abs.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Cadarso, Salamanca Manuel. "Influence of different frequencies order in a multi-step LSTM forecast for crowd movement in the domains of transportation and retail." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254884.

Full text
Abstract:
Denna avhandling presenterar ett tillvägagångssätt för att förutspå förflyttning inom folkmassor med hjälp av LSTM-neurala nätverk. Specifikt analyseras inflytandet som olika frekvenser av tidsserier har på både prognosen för folkmassorna och designen i arkitekturen inom transport och handel. Arkitekturen påverkas även då frekvensändringar provocerar fram en ökning eller minskning i datamängd och arkitekturen därför bör anpassas. Tidigare forskning inom prognoser relaterade till folkmassor har huvudsakligen fokuserat på att förutspå folkmassans nästa förflyttning snarare än att definiera mängden människor på en specifik plats under ett specifikt tidsspann. Dessa studier har använt olika tekniker som till exempel Random Forest eller Feed Forward neurala nätverk för att ta reda på inflytandet som de olika frekvenserna har över prognosens resultat. Denna avhandling tillämpar istället LSTM-neurala nätverk för analysering av detta inflytande och använder specifika fältrelaterade tekniker för att hitta de bästa parametrarna för att förutspå framtida välstånd i folkmassor. Resultatet visar att frekvensordningen i en tidsserie tydligt påverkar resultatet av prognoserna inom transport och handel, och att detta inflytande är positivt när frekvensordningen av tidsserierna kan fånga upp frekvensens form i prognosen. Därför, med frekvensordningen i åtanke, visar resultaten i prognoserna för de analyserade platserna en förbättring på 40% för SMAPE och 50% för RMSE jämfört med inhemska tillvägagångssätt och andra tekniker. Utöver detta visar de även att det finns ett samband mellan frekvensordningen och komponenterna i arkitekturerna.
This thesis presents an approach to predict crowd movement in defined placesusing LSTM neural networks. Specifically, it analyses the influence that different frequencies of time series have in both the crowd forecast and the design of the architecture in the domains of transportation and retail. The architecture is also affected because changes in the frequency provokes an increment or decrement in the quantity of data and, therefore, the architecture should be adapted. Previous research in the field of crowd prediction has been mainly focused on anticipating the next movement of the crowd rather than defining the amount of people during a specific range of time in a particular place. These studies have used different techniques such as Random Forest or Feed-Forward neural networks in order to find out the influence that the different frequencies have in the results of the forecast. However, this thesis applies LSTM neural networks for analysing this influence and uses specific field-related techniques in order to find the best parameters for forecasting future crowd movement. The results show that the order of the frequency of a time series clearly affects the outcomes of the predictions in the field of transportation and retail, being this influence positive when the order of the frequency of time series is able to catch the shape of the frequency of the forecast. Therefore, taking into account the order of the frequency, the results of the forecast for the analyzed places show an improvement of 40% for SMAPE and 50% for RMSE compared to the Naive approach and other techniques. Furthermore, they point out that there is a relation between the order of the frequency and the components of the architectures.
APA, Harvard, Vancouver, ISO, and other styles
40

Schneider, Sellanes Ruben Gerardo. "Estratégias de computação seqüenciais e paralelas sobre espaços coerentes." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1996. http://hdl.handle.net/10183/24495.

Full text
Abstract:
As estruturas de dados concretas (cds) são quaternas (C, V, E, l-) que contêm um conjunto C de células, um conjunto V de valores, um conjunto E de eventos e uma relação de habilitação l-. O conjunto de estados de uma cds é um domínio concreto que pode ser considerada a parte "abstrata" das cds. Da mesma maneira tem-se que os domínios de eventos (que são generalizações dos domínios concretos) são a parte abstrata das estruturas de eventos. Mostra-se a relação dos domínios concretos e domínios de eventos com os espaços coerentes, assim como também das teias de espaços coerentes com as cds e estruturas de eventos. Intuitivamente, uma cds é uma teia de um espaço coerente se toda célula c de C não é habilitada por nenhum evento (ou equivalentemente, é habilitada pelo conjunto vazio), isto é, V C E C, 0 F c. Outra forma de expressar isto é dizer que uma cds e uma teia de um espaço coerente se o conjunto de estados da cds é um espaço coerente. Definem-se os algoritmos lineares como sendo estados de uma cds no estilo dos algoritmos seqüenciais do Curien ([CUR 86]). Em particular as cds consideradas são teias de espaços coerentes. Mostra-se como obter a cds !A—>B, a partir de uma função estável f. A —> B. O algoritmo linear desta cds possui todas as estratégias de computação (seqüenciais e paralelas) que computam a função subjacente f, o que implica que os algoritmos lineares podem ser considerados meta-algoritmos. Mostra-se que para toda estratégia de computação seqüencial de um algoritmo linear, existe um algoritmo seqüencial de Curien que computa a mesma função, e vice-versa. A definição de estratégia de computação é dada de maneira tal que permite se dar semântica a segmentos de programas. Define-se uma operação de composição de estratégias, de forma tal que se pode obter uma estratégia de computação de um programa, a partir da composição das estratégias dos segmentos.
The concrete data structures, or cds, (C, V, E, l-) consists of a set C of cells, a set V of values. a set E of events and an enabling relation l-. The set of states of a cds is a concrete domain, that can be considered the "abstract" counterpart of the cds. In the same way we have that the events domains (that are more general that the concretes domains) are the abstract counterpart of the events structures. We show the relation between the concretes domains and events domains with the coherence spaces, as just as the relation between the cds and events structures with webs of coherence spaces. Intuitivelly, a cds is a web of a coherence space if any cell c is not enabled for any event, i.e. Vce C, 0 F c. We can say that a cds is a web of a coherence space if the set of states of the cds is a coherence space. We define the linear algorithms as states of a cds following the Curien's sequential algorithms ([CUR 86]). In particular the cds considered are webs of coherence spaces. We show how to obtain a cds !A—>B from a stable function f. A —> B. The linear algorithm of this cds contain all the computational strategies (sequentials and parallels) that compute the subjacent function f; this implies that the linear algorithms can be considered a kind of meta-algorithms. We show that for all sequential computational strategy of a linear al gorithm exists a Curien's sequential algorithm that compute the same function and conversely. We define the computational strategies in such a way that we can give semantic of segments of programs. We define a composition operation for strategies. This operation has the advantage that we can obtain the computational strategy of a program as the composition of the segments of it.
APA, Harvard, Vancouver, ISO, and other styles
41

Bremner, Ausra. "Impact of migration to the UK on Lithuanian migrant family relationships." Thesis, De Montfort University, 2017. http://hdl.handle.net/2086/15354.

Full text
Abstract:
Since the opening of European borders to new EU member states, a large number of immigrants continue to arrive in the UK and specifically to the East Midlands and East Anglia. To date, little or no research has been conducted to understand their experience and adjustment in this part of the country. With my research I aimed to find out how Lithuanian emigration affected family relationships and to identify issues that families face when a member emigrates on his/her own. I conducted qualitative research using different methods of data collection: online (skype) and face-to-face interviews, focus group and remote discussion techniques. Data has been coded using NVivo8 and NVivo10 and analysed using grounded theory. Findings show that the transition stage, while a family lives apart, puts an enormous strain on relationships within a family. However, it does not lead to nor causes break ups provided the family was a close unit prior to migration. The final results support the emerging theory that if the family had good relationships back in Lithuania, then all challenges of migration would not break that bond. On the contrary, they would strengthen relationships. My findings answer the initial research question as to whether migration to the UK affects Lithuanian family relationships by suggesting that it does not any more than any other stressful life events, e.g. death, childbirth, job loss, illness, house move, etc. Findings suggest that, if families discuss matters and look for the solutions together, the negative impact of migration might be avoided or lessened. My research contributes to the knowledge by applying novelty frameworks such as grounded theory and Layder’s theory of social domains in order to analyse and understand the Lithuanian migration phenomenon in the UK, particularly in East Anglia and the East Midlands.
APA, Harvard, Vancouver, ISO, and other styles
42

Gharib, Hamid. "Domain data typing." Thesis, University of Newcastle Upon Tyne, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.267005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Åberg, Elin. "Is it possible to define different process domains in stream systems based on remote data? : Comparing surficial geology, geomorphological characteristics in the landscape and channel slope between lakes, rapids and slow-flowing reaches." Thesis, Umeå universitet, Institutionen för ekologi, miljö och geovetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-162933.

Full text
Abstract:
Restoration of stream channels have become a common way of trying to restore both the channels and the ecosystems that earlier have been channelized mainly to facilitate the movement of timber. According to previous studies a lot of the restoration has been performed without a sufficiently detailed plan and with too little focus on how the landscape interplay with the restoration, which makes the potential to learn from possible mistakes minimal. In this study, a hydrological analysis of Hjuken river was done to examine if remote data through an analysis using GIS could be used for identifying three different process domains (lake, slow-flowing reaches and rapids), and if it is possible to determine which process domain it is by examining three different variables: channel slope, surficial geology and the geomorphologic characteristics in the landscape. Based on the statistical treatment and the analysis of the data, the result shows a significant difference between every process domain and variable except for the channel slope when it comes to slow-flowing reaches and rapids. This tells us that all the variables that has been analysed could be a crucial factor in most of the cases. However, the result does not seem reliable compared to previous studies. The conclusion of the study is that the error from the identification of the process domains is from the orthophotos. Remote data is too weak to use as the only source for this kind of analysis. However, the definition of process domains is probably more diffuse than today’s description. There needs to be more studies on each process domain, it is probably not enough with three different types, either there should be subclasses for each process domain or even more process domains.
APA, Harvard, Vancouver, ISO, and other styles
44

Dintelmann, Eva. "Fluids in the exterior domain of several moving obstacles /." Berlin : wvb, Wiss. Verl, 2007. http://www.wvberlin.de/data/inhalt/dintelmann.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Beatton, Douglas Anthony. "The economics of happiness : a lifetime perspective." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/50009/1/Douglas_Beatton_Thesis.pdf.

Full text
Abstract:
The three studies in this thesis focus on happiness and age and seek to contribute to our understanding of happiness change over the lifetime. The first study contributes by offering an explanation for what was evolving to a ‘stylised fact’ in the economics literature, the U-shape of happiness in age. No U-shape is evident if one makes a visual inspection of the age happiness relationship in the German socio-economic panel data, and, it seems counter-intuitive that we just have to wait until we get old to be happy. Eliminating the very young, the very old, and the first timers from the analysis did not explain away regression results supporting the U-shape of happiness in age, but fixed effect analysis did. Analysis revealed found that reverse causality arising from time-invariant individual traits explained the U-shape of happiness in age in the German population, and the results were robust across six econometric methods. Robustness was added to the German fixed effect finding by replicating it with the Australian and the British socio-economic panel data sets. During analysis of the German data an unexpected finding emerged, an exceedingly large negative linear effect of age on happiness in fixed-effect regressions. There is a large self-reported happiness decline by those who remain in the German panel. A similar decline over time was not evident in the Australian or the British data. After testing away age, time and cohort effects, a time-in-panel effect was found. Germans who remain in the panel for longer progressively report lower levels of happiness. Because time-in-panel effects have not been included in happiness regression specifications, our estimates may be biased; perhaps some economics of the happiness studies, that used German panel data, need revisiting. The second study builds upon the fixed-effect finding of the first study and extends our view of lifetime happiness to a cohort little visited by economists, children. Initial analysis extends our view of lifetime happiness beyond adulthood and revealed a happiness decline in adolescent (15 to 23 year-old) Australians that is twice the size of the happiness decline we see in older Australians (75 to 86 yearolds), who we expect to be unhappy due to declining income, failing health and the onset of death. To resolve a difference of opinion in the literature as to whether childhood happiness decreases, increases, or remains flat in age; survey instruments and an Internet-based survey were developed and used to collect data from four hundred 9 to 14 year-old Australian children. Applying the data to a Model of Childhood Happiness revealed that the natural environment life-satisfaction domain factor did not have a significant effect on childhood happiness. However, the children’s school environment and interactions with friends life-satisfaction domain factors explained over half a steep decline in childhood happiness that is three times larger than what we see in older Australians. Adding personality to the model revealed what we expect to see with adults, extraverted children are happier, but unexpectedly, so are conscientious children. With the steep decline in the happiness of young Australians revealed and explanations offered, the third study builds on the time-invariant individual trait finding from the first study by applying the Australian panel data to an Aggregate Model of Average Happiness over the lifetime. The model’s independent variable is the stress that arises from the interaction between personality and the life event shocks that affect individuals and peers throughout their lives. Interestingly, a graphic depiction of the stress in age relationship reveals an inverse U-shape; an inverse U-shape that looks like the opposite of the U-shape of happiness in age we saw in the first study. The stress arising from life event shocks is found to explain much of the change in average happiness over a lifetime. With the policy recommendations of economists potentially invoking unexpected changes in our lives, the ensuing stress and resulting (un)happiness warrant consideration before economists make policy recommendations.
APA, Harvard, Vancouver, ISO, and other styles
46

Morris, Christopher Robert. "Data integration in the rail domain." Thesis, University of Birmingham, 2018. http://etheses.bham.ac.uk//id/eprint/8204/.

Full text
Abstract:
The exchange of information is crucial to the operation of railways; starting with the distribution of timetables, information must constantly be exchanged in any railway network. The slow evolution of the information environment within the rail industry has resulted in the existence of a diverse range of systems, only able to exchange information essential to railway operations. Were the cost of data integration reduced, then further cost reductions and improvements to customer service would follow as barriers to the adoption of other technologies are removed. The need for data integration has already been studied extensively and has been included in the UK industry's rail technical strategy however, despite it's identification as a key technique for improving integration, uptake of ontology remains limited. This thesis considers techniques to reduce barriers to the take up of ontology in the UK rail industry, and presents a case study in which these techniques are applied. Amongst the key barriers to uptake identified are a lack of software engineers with ontology experience, and the diverse information environment within the rail domain. Techniques to overcomes these barriers using software based tools are considered, and example tools produced which aid the overcoming of these barriers. The case study presented is of a degraded mode signalling system, drawing data from a range of diverse sources, integrated using an ontology. Tools created to improve data integration are employed in this commercial project, successfully combing signalling data with (simulated) train positioning data.
APA, Harvard, Vancouver, ISO, and other styles
47

RAJABI, HANIEH. "Secure conditional cross-domain data sharing." Doctoral thesis, Università degli Studi di Roma "Tor Vergata", 2013. http://hdl.handle.net/2108/204177.

Full text
Abstract:
Defensive techniques against Internet-scale attacks can signicantly benet from sharing network security data among dierent domains. However, cross-domain collaborative security is aected by a native dichotomy. On one side, sharing of monitoring data across domains may signicantly help in detecting large scale threats and attacks; on the other side, data sharing con icts with the need to protect network customers' privacy and condentiality of business and operational information. In this thesis, we address the challenges through sharing network security data and we propose two distinct approaches enable what we call conditional data sharing, i.e., permit cross-domain sharing of ne-grained organized subsets of network security data, only when a global attack is ongoing in the network and multiple of the domain contributors are ready to reveal their data for the same incident. In the rst so called threshold-based approach, we propose a cryptographic construction devised to permit disclosure of cross-domain shared ne-grained organized subsets of network monitoring data, only when a threshold number of domains are determined for the data closure. The proposed approach revolves on a careful combination of distributed threshold based cryptography with identity-based encryption. Protection is accomplished by \simply" using dierent cryptographic keys per monitoring feed, and automatically permitting per-feed key reconstruction upon the occurrence of independent and asynchronous per-domain/per-feed alerts. Due to the rigid limitation of threshold-based approach for data disclosure, we signicantly extend the underlying cryptographic approach so as to support disclosure not only for threshold-based policies, but for more general (monotone) access structures. We further show that both solutions appear scalable and easy to deploy, not requiring neither a-priori monitoring data feeds identication, nor explicit coordination among domains. We cast such technique to a realistic scenario of whitelist sharing for DDoS mitigation. Therefore, in the case of whitelists for DDoS mitigation, where domains broadcast, for each possible DDoS target, the set of legitimate customers (client IP addresses) whose trac should not be blocked while a DDoS attack is in progress. However, such a ne-grained whitelist sharing approach appears hardly appealing (to say the least) to operators; not only the indiscriminate sharing of customers' addresses raises privacy concerns, but also it discloses, to competitor domains, business critical information on the identity and activity of customers. In Appendix A, there is a list of my publications related to the three contributions of this PhD thesis
APA, Harvard, Vancouver, ISO, and other styles
48

Su, Weifeng. "Domain-based data integration for Web databases /." View abstract or full-text, 2007. http://library.ust.hk/cgi/db/thesis.pl?CSED%202007%20SU.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Varga, Andrea. "Exploiting domain knowledge for cross-domain text classification in heterogeneous data sources." Thesis, University of Sheffield, 2014. http://etheses.whiterose.ac.uk/7538/.

Full text
Abstract:
With the growing amount of data generated in large heterogeneous repositories (such as the Word Wide Web, corporate repositories, citation databases), there is an increased need for the end users to locate relevant information efficiently. Text Classification (TC) techniques provide automated means for classifying fragments of text (phrases, paragraphs or documents) into predefined semantic types, allowing an efficient way for organising and analysing such large document collections. Current approaches to TC rely on supervised learning, which perform well on the domains on which the TC system is built, but tend to adapt poorly to different domains. This thesis presents a body of work for exploring adaptive TC techniques across hetero- geneous corpora in large repositories with the goal of finding novel ways of bridging the gap across domains. The proposed approaches rely on the exploitation of domain knowledge for the derivation of stable cross-domain features. This thesis also investigates novel ways of estimating the performance of a TC classifier, by means of domain similarity measures. For this purpose, two novel knowledge-based similarity measures are proposed that capture the usefulness of the selected cross-domain features for cross-domain TC. The evaluation of these approaches and measures is presented on real world datasets against various strong baseline methods and content-based measures used in transfer learning. This thesis explores how domain knowledge can be used to enhance the representation of documents to address the lexical gap across the domains. Given that the effectiveness of a text classifier largely depends on the availability of annotated data, this thesis explores techniques which can leverage data from social knowledge sources (such as DBpedia and Freebase). Techniques are further presented, which explore the feasibility of exploiting different semantic graph structures from knowledge sources in order to create novel cross- domain features and domain similarity metrics. The methodologies presented provide a novel representation of documents, and exploit four wide coverage knowledge sources: DBpedia, Freebase, SNOMED-CT and MeSH. The contribution of this thesis demonstrates the feasibility of exploiting domain knowl- edge for adaptive TC and domain similarity, providing an enhanced representation of docu- ments with semantic information about entities, that can indeed reduce the lexical differences between domains.
APA, Harvard, Vancouver, ISO, and other styles
50

Domeniconi, Giacomo <1986&gt. "Data and Text Mining Techniques for In-Domain and Cross-Domain Applications." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amsdottorato.unibo.it/7494/1/domeniconi_giacomo_tesi.pdf.

Full text
Abstract:
In the big data era, a wide amount of data has been generated in different domains, from social media to news feeds, from health care to genomic functionalities. When addressing a problem, we usually need to harness multiple disparate datasets. Data from different domains may follow different modalities, each of which has a different representation, distribution, scale and density. For example, text is usually represented as discrete sparse word count vectors, whereas an image is represented by pixel intensities, and so on. Nowadays plenty of Data Mining and Machine Learning techniques are proposed in literature, which have already achieved significant success in many knowledge engineering areas, including classification, regression and clustering. Anyway some challenging issues remain when tackling a new problem: how to represent the problem? What approach is better to use among the huge quantity of possibilities? What is the information to be used in the Machine Learning task and how to represent it? There exist any different domains from which borrow knowledge? This dissertation proposes some possible representation approaches for problems in different domains, from text mining to genomic analysis. In particular, one of the major contributions is a different way to represent a classical classification problem: instead of using an instance related to each object (a document, or a gene, or a social post, etc.) to be classified, it is proposed to use a pair of objects or a pair object-class, using the relationship between them as label. The application of this approach is tested on both flat and hierarchical text categorization datasets, where it potentially allows the efficient addition of new categories during classification. Furthermore, the same idea is used to extract conversational threads from an unregulated pool of messages and also to classify the biomedical literature based on the genomic features treated.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography