To see the other types of publications on this topic, follow the link: Bayes predictor.

Dissertations / Theses on the topic 'Bayes predictor'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Bayes predictor.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Zerbeto, Ana Paula. "Melhor preditor empírico aplicado aos modelos beta mistos." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-09042014-132109/.

Full text
Abstract:
Os modelos beta mistos são amplamente utilizados na análise de dados que apresentam uma estrutura hierárquica e que assumem valores em um intervalo restrito conhecido. Com o objetivo de propor um método de predição dos componentes aleatórios destes, os resultados previamente obtidos na literatura para o preditor de Bayes empírico foram estendidos aos modelos de regressão beta com intercepto aleatório normalmente distribuído. O denominado melhor preditor empírico (MPE) proposto tem aplicação em duas situações diferentes: quando se deseja fazer predição sobre os efeitos individuais de novos elementos de grupos que já fizeram parte da base de ajuste e quando os grupos não pertenceram à tal base. Estudos de simulação foram delineados e seus resultados indicaram que o desempenho do MPE foi eficiente e satisfatório em diversos cenários. Ao utilizar-se da proposta na análise de dois bancos de dados da área da saúde, observou-se os mesmos resultados obtidos nas simulações nos dois casos abordados. Tanto nas simulações, quanto nas análises de dados reais, foram observados bons desempenhos. Assim, a metodologia proposta se mostrou promissora para o uso em modelos beta mistos, nos quais se deseja fazer predições.
The mixed beta regression models are extensively used to analyse data with hierarquical structure and that take values in a restricted and known interval. In order to propose a prediction method for their random components, the results previously obtained in the literature for the empirical Bayes predictor were extended to beta regression models with random intercept normally distributed. The proposed predictor, called empirical best predictor (EBP), can be applied in two situations: when the interest is predict individuals effects for new elements of groups that were already analysed by the fitted model and, also, for elements of new groups. Simulation studies were designed and their results indicated that the performance of EBP was efficient and satisfatory in most of scenarios. Using the propose to analyse two health databases, the same results of simulations were observed in both two cases of application, and good performances were observed. So, the proposed method is promissing for the use in predictions for mixed beta regression models.
APA, Harvard, Vancouver, ISO, and other styles
2

Ayme, Alexis. "Supervised learning with missing data : a non-asymptotic point of view." Electronic Thesis or Diss., Sorbonne université, 2024. http://www.theses.fr/2024SORUS252.

Full text
Abstract:
Les valeurs manquantes sont courantes dans la plupart des ensembles de données du monde réel, en raison de la combinaison de sources multiples et d'informations intrinsèquement manquantes, telles que des défaillances de capteurs ou des questions d'enquête sans réponse. La présence de valeurs manquantes empêche souvent l'application d'algorithmes d'apprentissage standard. Cette thèse examinevaleurs manquantes dans un contexte de prédiction, visant à obtenir des prédictions précises malgré l'occurrence de données manquantes dans les données d'apprentissage et de test. L'objectif de cette thèse est d'analyser théoriquement des algorithmes spécifiques pour obtenir des garanties d'échantillons finis. Nous dérivons des bornes inférieures minimax sur le risque des prédictions linéaires en présence de valeurs manquantes. Ces bornes inférieures dépendent de la distribution du motif de valeurs manquantes et peuvent croître de manière exponentielle avec la dimension. Nous proposons une méthode très simple consistant à appliquer la procédure des moindres carrés uniquement aux motifs de valeurs manquantes les plus fréquents. Une telle méthode simple se révèle être une procédure presque minimax-optimale, qui s'écarte de l'algorithme des moindres carrés appliqué à tous les motifs de valeurs manquantes. Par la suite, nous explorons la méthode de l'imputation puis régression, où l'imputation est effectuée en utilisant l'imputation naïve par zéro, et l'étape de régression est réalisée via des modèles linéaires, dont les paramètres sont appris via la descente de gradient stochastique. Nous démontrons que cette méthode très simple offre de fortes garanties pour des échantillons finis dans des contextes de grande dimension. Plus précisément, nous montrons que le biais de cette méthode est inférieur au biais de la régression ridge. Étant donné que la régression ridge est souvent utilisée en haute dimension, cela prouve que le biais des données manquantes (via l'imputation par zéro) est négligeable dans certains contextes de grande dimension. Enfin, nous étudions différents algorithmes pour gérer la classification linéaire en présence de données manquantes (régression logistique, perceptron, LDA). Nous prouvons que la LDA est le seul modèle qui peut être valide pour des données complètes et manquantes dans certains contextes génériques
Missing values are common in most real-world data sets due to the combination of multiple sources andinherently missing information, such as sensor failures or unanswered survey questions. The presenceof missing values often prevents the application of standard learning algorithms. This thesis examinesmissing values in a prediction context, aiming to achieve accurate predictions despite the occurrence ofmissing data in both training and test datasets. The focus of this thesis is to theoretically analyze specific algorithms to obtain finite-sample guarantees. We derive minimax lower bounds on the excess risk of linear predictions in presence of missing values.Such lower bounds depend on the distribution of the missing pattern, and can grow exponentially withthe dimension. We propose a very simple method consisting in applying Least-Square procedure onthe most frequent missing patterns only. Such a simple method turns out to be near minimax-optimalprocedure, which departs from the Least-Square algorithm applied to all missing patterns. Followingthis, we explore the impute-then-regress method, where imputation is performed using the naive zeroimputation, and the regression step is carried out via linear models, whose parameters are learned viastochastic gradient descent. We demonstrate that this very simple method offers strong finite-sampleguarantees in high-dimensional settings. Specifically, we show that the bias of this method is lowerthan the bias of ridge regression. As ridge regression is often used in high dimensions, this proves thatthe bias of missing data (via zero imputation) is negligible in some high-dimensional settings. Thesefindings are illustrated using random features models, which help us to precisely understand the role ofdimensionality. Finally, we study different algorithm to handle linear classification in presence of missingdata (logistic regression, perceptron, LDA). We prove that LDA is the only model that can be valid forboth complete and missing data for some generic settings
APA, Harvard, Vancouver, ISO, and other styles
3

Laws, David Joseph. "A Bayes decision theoretic approach to the optimal design of screens." Thesis, University of Newcastle Upon Tyne, 1997. http://hdl.handle.net/10443/648.

Full text
Abstract:
An item may be said to reach a standard suitable for use if it has some prescribed attributes. Supposet hat a variable 2: measurest he standard and TE, qT. if an item has the desired attributes. The variable -T may be very expensive to measure and so, some cheaper to measure screening variables, X say, correlated to I may be used to classify items. The purpose of screen design is to determine CX, the region of X space, for which an item should be said to reach the standard. If the error probabilities of classifying an item based on X are very high it may be economical to measure IT. Chapter 2 deals with this idea in the context of a very simple two-stage set-up in which, at the first stage of the screen a univariate screening variable X is measured. Some items are sentenced as acceptable or unacceptable, and the remainder are passed on to the second stage at which T is determined. The optimal screen is found that minimises cost, where costs are given for misclassifying items and for measuring the variables. The variable T is assumed binary and the model for TIX is a probit regression model. In designing a two-stage screen, Chapter 3 considers: (a) a general stochastic structure for (1, X), (b) a general loss function set up for misclassification costs and (c) assumes no fixed form for the screen. Also in Chapter 3, we consider a scenario in which a statistical goal or constraint is imposed in addition to the decision-theoretic target of minimising expected cost. In Chapter 4 we consider a sequential screen that operates as follows. At each stage of a sequence a covariate is measured and items may be accepted as suitable, discarded or passed on to the next stage. At the final stage the performance variable T is measured. Returning to the simple one-stage screen based solely on measuring covariates, Chapter 5 poses the question of how many and which covariates to include as part of the screen.
APA, Harvard, Vancouver, ISO, and other styles
4

Wong, Hubert. "Small sample improvement over Bayes prediction under model uncertainty." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp02/NQ56646.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Dahlgren, Lindström Adam. "Structured Prediction using Voted Conditional Random FieldsLink Prediction in Knowledge Bases." Thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-140692.

Full text
Abstract:
Knowledge bases are useful in the validation of automatically extracted information, and for hypothesis selection during the extraction process. Building knowledge bases is a dfficult task and the process is bound to miss facts. Therefore, the existence of facts can be estimated using link prediction, i.e., by solving the structured prediction problem.It has been shown that combining directly observable features with latent features increases performance. Observable features include, e.g., the presence of another chain of facts leading to the same end point. Latent features include, e.g, properties that are not modelled by facts on the form subject-predicate-object, such as being a good actor. Observable graph features are modelled using the Path Ranking Algorithm, and latent features using the bilinear RESCAL model. Voted Conditional Random Fields can be used to combine feature families while taking into account their complexity to minimize the risk of training a poor predictor. We propose a combined model fusing these theories together with a complexity analysis of the feature families used. In addition, two simple feature families are constructed to model neighborhood properties.The model we propose captures useful features for link prediction, but needs further evaluation to guarantee effcient learning. Finally, suggestions for experiments and other feature families are given.
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Benmei. "Hierarchical Bayes estimation and empirical best prediction of small-area proportions." College Park, Md.: University of Maryland, 2009. http://hdl.handle.net/1903/9149.

Full text
Abstract:
Thesis (Ph.D.) -- University of Maryland, College Park, 2009.
Thesis research directed by: Joint Program in Survey Methodology. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
7

Bakal, Mehmet. "Relation Prediction over Biomedical Knowledge Bases for Drug Repositioning." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/90.

Full text
Abstract:
Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task using existing biomedical knowledge bases. Our approaches can be broadly labeled as link prediction or knowledge base completion in computer science literature. Specifically, first we investigate the predictive power of graph paths connecting entities in the publicly available biomedical knowledge base, SemMedDB (the entities and relations constitute a large knowledge graph as a whole). To that end, we build logistic regression models utilizing semantic graph pattern features extracted from the SemMedDB to predict treatment and causative relations in Unified Medical Language System (UMLS) Metathesaurus. Second, we study matrix and tensor factorization algorithms for predicting drug repositioning pairs in repoDB, a general purpose gold standard database of approved and failed drug–disease indications. The idea here is to predict repoDB pairs by approximating the given input matrix/tensor structure where the value of a cell represents the existence of a relation coming from SemMedDB and UMLS knowledge bases. The essential goal is to predict the test pairs that have a blank cell in the input matrix/tensor based on the shared biomedical context among existing non-blank cells. Our final approach involves graph convolutional neural networks where entities and relation types are embedded in a vector space involving neighborhood information. Basically, we minimize an objective function to guide our model to concept/relation embeddings such that distance scores for positive relation pairs are lower than those for the negative ones. Overall, our results demonstrate that recent link prediction methods applied to automatically curated, and hence imprecise, knowledge bases can nevertheless result in high accuracy drug candidate prediction with appropriate configuration of both the methods and datasets used.
APA, Harvard, Vancouver, ISO, and other styles
8

Khan, Imran Qayyum. "Simultaneous prediction of symptom severity and cause in data from a test battery for Parkinson patients, using machine learning methods." Thesis, Högskolan Dalarna, Datateknik, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:du-4586.

Full text
Abstract:
The main purpose of this thesis project is to prediction of symptom severity and cause in data from test battery of the Parkinson’s disease patient, which is based on data mining. The collection of the data is from test battery on a hand in computer. We use the Chi-Square method and check which variables are important and which are not important. Then we apply different data mining techniques on our normalize data and check which technique or method gives good results.The implementation of this thesis is in WEKA. We normalize our data and then apply different methods on this data. The methods which we used are Naïve Bayes, CART and KNN. We draw the Bland Altman and Spearman’s Correlation for checking the final results and prediction of data. The Bland Altman tells how the percentage of our confident level in this data is correct and Spearman’s Correlation tells us our relationship is strong. On the basis of results and analysis we see all three methods give nearly same results. But if we see our CART (J48 Decision Tree) it gives good result of under predicted and over predicted values that’s lies between -2 to +2. The correlation between the Actual and Predicted values is 0,794in CART. Cause gives the better percentage classification result then disability because it can use two classes.
APA, Harvard, Vancouver, ISO, and other styles
9

Wang, Kai. "Novel computational methods for accurate quantitative and qualitative protein function prediction /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/11488.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Fredette, Marc. "Prediction of recurrent events." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/1142.

Full text
Abstract:
In this thesis, we will study issues related to prediction problems and put an emphasis on those arising when recurrent events are involved. First we define the basic concepts of frequentist and Bayesian statistical prediction in the first chapter. In the second chapter, we study frequentist prediction intervals and their associated predictive distributions. We will then present an approach based on asymptotically uniform pivotals that is shown to dominate the plug-in approach under certain conditions. The following three chapters consider the prediction of recurrent events. The third chapter presents different prediction models when these events can be modeled using homogeneous Poisson processes. Amongst these models, those using random effects are shown to possess interesting features. In the fourth chapter, the time homogeneity assumption is relaxed and we present prediction models for non-homogeneous Poisson processes. The behavior of these models is then studied for prediction problems with a finite horizon. In the fifth chapter, we apply the concepts discussed previously to a warranty dataset coming from the automobile industry. The number of processes in this dataset being very large, we focus on methods providing computationally rapid prediction intervals. Finally, we discuss the possibilities of future research in the last chapter.
APA, Harvard, Vancouver, ISO, and other styles
11

Eldud, Omer Ahmed Abdelkarim. "Prediction of protein secondary structure using binary classificationtrees, naive Bayes classifiers and the Logistic Regression Classifier." Thesis, Rhodes University, 2016. http://hdl.handle.net/10962/d1019985.

Full text
Abstract:
The secondary structure of proteins is predicted using various binary classifiers. The data are adopted from the RS126 database. The original data consists of protein primary and secondary structure sequences. The original data is encoded using alphabetic letters. These data are encoded into unary vectors comprising ones and zeros only. Different binary classifiers, namely the naive Bayes, logistic regression and classification trees using hold-out and 5-fold cross validation are trained using the encoded data. For each of the classifiers three classification tasks are considered, namely helix against not helix (H/∼H), sheet against not sheet (S/∼S) and coil against not coil (C/∼C). The performance of these binary classifiers are compared using the overall accuracy in predicting the protein secondary structure for various window sizes. Our result indicate that hold-out cross validation achieved higher accuracy than 5-fold cross validation. The Naive Bayes classifier, using 5-fold cross validation achieved, the lowest accuracy for predicting helix against not helix. The classification tree classifiers, using 5-fold cross validation, achieved the lowest accuracies for both coil against not coil and sheet against not sheet classifications. The logistic regression classier accuracy is dependent on the window size; there is a positive relationship between the accuracy and window size. The logistic regression classier approach achieved the highest accuracy when compared to the classification tree and Naive Bayes classifiers for each classification task; predicting helix against not helix with accuracy 77.74 percent, for sheet against not sheet with accuracy 81.22 percent and for coil against not coil with accuracy 73.39 percent. It is noted that it is easier to compare classifiers if the classification process could be completely facilitated in R. Alternatively, it would be easier to assess these logistic regression classifiers if SPSS had a function to determine the accuracy of the logistic regression classifier.
APA, Harvard, Vancouver, ISO, and other styles
12

Getty, Kimberly Chapman. "Gender and Professional Experience as Predictors of Consultants' Likelihood of Use of Social Power Bases." NCSU, 2006. http://www.lib.ncsu.edu/theses/available/etd-04172006-105027/.

Full text
Abstract:
The social power typology originally identified by French and Raven (1959) and later modified by Raven (1965, 1992) was used to examine factors related to school psychological consultation. Specifically, this dissertation investigated whether the gender and amount of relevant professional experience of psychologists (i.e., consultants) and teachers (i.e., consultees) influenced how likely psychologists were to use soft power bases when consulting with teachers. In addition, this study examined whether consultants? use of soft power bases was related to their self-evaluations of effectiveness during consultation. Two instruments were employed: the Interpersonal Power Inventory (IPI), which was modified to examine school consultants? likelihood of use of social power bases when consulting with teachers; and the Consultant Evaluation Form (CEF), which was modified to assess psychologists? self-evaluations of effectiveness during teacher consultation. The IPI and CEF were mailed together to 1,000 Nationally Certified School Psychologists, and a total of 352 usable protocols were returned. Results indicated that when consulting with female teachers, female consultants were not more likely to use positive referent power than the other four soft power bases combined; however, male psychologists were more likely to use positive expert power than the other four soft power bases combined. Additional results indicated that consultants? likelihood of use of soft power bases was not related to their years of professional experience, although results of a secondary set of analyses using a slightly different constellation of soft power bases did yield a significant relationship between the two variables. Findings also revealed a significant relationship between consultees? years of experience and consultants? use of soft power bases, in that school consultants were less likely to use soft power with more experienced teachers. Finally, results indicated a significant, positive relationship between consultants? likelihood of use of soft power bases and their self-evaluations of effectiveness during consultation. Findings of this study suggest that the experience level of teachers plays a significant role in determining the influence strategies used by psychologists during consultation. Results also imply that consultants? use of soft power is related to perceptions of more effective school consultation.
APA, Harvard, Vancouver, ISO, and other styles
13

Kothawade, Rohan Dilip. "Wine quality prediction model using machine learning techniques." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20009.

Full text
Abstract:
The quality of a wine is important for the consumers as well as the wine industry. The traditional (expert) way of measuring wine quality is time-consuming. Nowadays, machine learning models are important tools to replace human tasks. In this case, there are several features to predict the wine quality but the entire features will not be relevant for better prediction. So, our thesis work is focusing on what wine features are important to get the promising result. For the purposeof classification model and evaluation of the relevant features, we used three algorithms namely support vector machine (SVM), naïve Bayes (NB), and artificial neural network (ANN). In this study, we used two wine quality datasets red wine and white wine. To evaluate the feature importance we used the Pearson coefficient correlation and performance measurement matrices such as accuracy, recall, precision, and f1 score for comparison of the machine learning algorithm. A grid search algorithm was applied to improve the model accuracy. Finally, we achieved the artificial neural network (ANN) algorithm has better prediction results than the Support Vector Machine (SVM) algorithm and the Naïve Bayes (NB) algorithm for both red wine and white wine datasets.
APA, Harvard, Vancouver, ISO, and other styles
14

Li, Qiuxiang. "Orthologous pair transfer and hybrid Bayes methods to predict the protein-protein interaction network of the Anopheles gambiae mosquitoes." Thesis, Imperial College London, 2008. http://hdl.handle.net/10044/1/4635.

Full text
Abstract:
Based on the published protein-protein interaction maps of five organisms and other public databases for domain-domain and protein-protein interactions, two new approaches are proposed to infer the protein-protein interaction network of the Anopheles gambiae (A. gambiae) mosquitoes. Our main contributions are: i) Adopted an orthologous protein pair transfer method that has so far not been seen in literature; ii) Proposed a new hybrid Bayes method; iii) Used voting machines at two levels of the combined classifier/predictor; iv) Used heterogeneous datasets as the training data; v) And finally, used the trained classifier to predict the protein interactions maps for A. gambiae, arguably one of a few least known organisms in terms of protein interaction mechanism. With the first method, the orthologous and in-paralogous protein clusters are extracted for both species. The relations between two peer-to-peer proteins in the two species are identified so that the interactions in the D. melanogaster protein interaction maps are transferred to pairs of interacting proteins in A. gambiae. The second strategy, namely the hybrid Bayes, is based on the domain composition of proteins, with which we utilize a probability model to build virtual domain-domain maps by integrating large-scale protein interaction data from five organisms, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Escherichia coli, Mus musculus and Drosophila melanogaster. For the hybrid Bayes method, once the virtual domain-domain interaction maps are constructed, we propose two ways to predict the protein-protein interaction maps. These two methods are compared and then combined to form a voting machine to collectively decide a protein-pair's candidacy. The users could adjust the weights for different methods to flexibly control the output. Parameters are chosen through running different experiments on the training data set. While both the orthologous cluster and hybrid Bayes methods produce encouraging results the second one predicts more protein-protein interaction than the first. Yet these two data sets share a very small fraction of common interactions. We adopt a second voting machine and calibrate the parameters with the putative protein interaction data. Those parameters for the voting machine are used to predict the protein-protein interaction maps of the A. gambiae and produces reasonably good results.
APA, Harvard, Vancouver, ISO, and other styles
15

Wilms, Christoph [Verfasser], Daniel [Akademischer Betreuer] Hoffmann, and Peter [Akademischer Betreuer] Bayer. "Methods for the prediction of complex biomolecular structures / Christoph Wilms. Gutachter: Peter Bayer. Betreuer: Daniel Hoffmann." Duisburg, 2014. http://d-nb.info/1048087301/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

LEGENDRE, JEAN-FRANCOIS. "Etude de modeles de prediction de la propagation bases sur la theorie geometrique de la diffraction." Rennes, INSA, 1995. http://www.theses.fr/1995ISAR0001.

Full text
Abstract:
Le travail presente dans ce memoire porte sur l'etablissement de modeles de prediction de la propagation bases sur les theories asymptotique qui permettent de calculer l'attenuation de propagation sur un profil de terrain ou sur une surface en fonction des parametres physiques et electriques de la liaison. Dans un premier temps, une presentation de l'optique geometrique et de la theorie geometrique de la diffraction, est effectuee. Cette etude bibliographique a permis d'etablir l'expression generale du champ total recu en adoptant le formalisme inherent aux methodes de rayons. Nous abordons ensuite la mise en uvre informatique du premier modele de prediction utilisant le profil de terrain 2d entre l'emetteur et le recepteur. L'ensemble des rayons est determine a partir d'une technique de lancer de rayons originale qui repose sur la notion de graphe de fluence. Apres avoir montre certaines deficiences theoriques de ce modele, nous donnons les modifications a lui apporter pour pouvoir l'appliquer a des profils de terrain reels. Ainsi, nous avons enrichi le modele 2d d'une bibliotheque morphologique decrivant la nature du terrain et de nouveaux coefficients de diffraction. Cet outil a ete applique a pres d'une centaine de liaisons point a point. Les resultats des simulations sont confrontes a des resultats experimentaux et theoriques issus de methodes differentes. Enfin, l'influence de l'imprecision des hauteurs du terrain sur le calcul du champ total est etudiee. Pour terminer, nous proposons une extension tridimensionnelle du modele de prediction de la propagation. Le nouveau lancer de rayons est base sur la theorie des images ou sur la resolution des equations de fermat par une methode du gradient. Par ailleurs, le modele 2d a ete adapte pour pouvoir traiter des configurations de type outdoor ou indoor en s'appuyant sur la coupe transversale du terrain
APA, Harvard, Vancouver, ISO, and other styles
17

Warsitha, Tedy, and Robin Kammerlander. "Analyzing the ability of Naive-Bayes and Label Spreading to predict labels with varying quantities of training data : Classifier Evaluation." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188132.

Full text
Abstract:
A study was performed on Naive-Bayes and Label Spread- ing methods applied in a spam filter as classifiers. In the testing procedure their ability to predict was observed and the results were compared in a McNemar test; leading to the discovery of the strengths and weaknesses of the chosen methods in a environment of varying training data. Though the results were inconclusive due to resource restrictions, the theory is discussed from various angles in order to pro- vide a better understanding of the conditions that can lead to potentially different results between the chosen meth- ods; opening up for improvement and further studies. The conclusion made of this study is that a significant differ- ence exists in terms of ability to predict labels between the two classifiers. On a secondary note it is recommended to choose a classifier depending on available training data and computational power.
En studie utfördes på klassifieringsmetoderna Naive-Bayes och Label Spreading applicerade i ett spam filter. Meto- dernas förmåga att predicera observerades och resultaten jämfördes i ett McNemar test, vilket ledde till upptäckten av styrkorna och svagheterna av de valda metoderna i en miljö med varierande träningsdata. Fastän resultaten var ofullständiga på grund av bristfälliga resurser, så diskute- ras den bakomliggande teorin utifrån flera vinklar. Denna diskussion har målet att ge en bättre förståelse kring de bakomliggande förutsättningarna som kan leda till poten- tiellt annorlunda resultat för de valda metoderna. Vidare öppnar detta möjligheter för förbättringar och framtida stu- dier. Slutsatsen som dras av denna studie är att signifikanta skillnader existerar i förmågan att kunna predicera klasser mellan de två valda klassifierarna. Den slutgiltiga rekom- mendationen blir att välja en klassifierare utifrån utbudet av träningsdata och tillgängligheten av datorkraft.
APA, Harvard, Vancouver, ISO, and other styles
18

Vavilikolanu, Srutha. "Crash Prediction Models on Truck-Related Crashes on Two-lane Rural Highways with Vertical Curves." University of Akron / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=akron1221758522.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Schram, Christophe. "Aeroacoustics of subsonic jets: Prediction of the sound produced by vortex pairing bases on particle image velocimetry." Doctoral thesis, Universite Libre de Bruxelles, 2003. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/211341.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Knecht, Casey Scott. "Crash Prediction Modeling for Curved Segments of Rural Two-Lane Two-Way Highways in Utah." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4352.

Full text
Abstract:
This thesis contains the results of the development of crash prediction models for curved segments of rural two-lane two-way highways in the state of Utah. The modeling effort included the calibration of the predictive model found in the Highway Safety Manual (HSM) as well as the development of Utah-specific models developed using negative binomial regression. The data for these models came from randomly sampled curved segments in Utah, with crash data coming from years 2008-2012. The total number of randomly sampled curved segments was 1,495. The HSM predictive model for rural two-lane two-way highways consists of a safety performance function (SPF), crash modification factors (CMFs), and a jurisdiction-specific calibration factor. For this research, two sample periods were used: a three-year period from 2010 to 2012 and a five-year period from 2008 to 2012. The calibration factor for the HSM predictive model was determined to be 1.50 for the three-year period and 1.60 for the five-year period. These factors are to be used in conjunction with the HSM SPF and all applicable CMFs. A negative binomial model was used to develop Utah-specific crash prediction models based on both the three-year and five-year sample periods. A backward stepwise regression technique was used to isolate the variables that would significantly affect highway safety. The independent variables used for negative binomial regression included the same set of variables used in the HSM predictive model along with other variables such as speed limit and truck traffic that were considered to have a significant effect on potential crash occurrence. The significant variables at the 95 percent confidence level were found to be average annual daily traffic, segment length, total truck percentage, and curve radius. The main benefit of the Utah-specific crash prediction models is that they provide a reasonable level of accuracy for crash prediction yet only require four variables, thus requiring much less effort in data collection compared to using the HSM predictive model.
APA, Harvard, Vancouver, ISO, and other styles
21

Hátle, Lukáš. "Využití Bayesovských sítí pro predikci korporátních bankrotů." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-192331.

Full text
Abstract:
The aim of this study is to evaluate feasibility of using Bayes classifiers for predicting corporate bankruptcies. The results obtain show that Bayes classifiers do reach comparable results to then more commonly used methods such the logistic regression and the decision trees. The comparison has been carried out based on Czech and Polish data sets. The overall accuracy rate of these so called naive Bayes classifiers, using entropic discretization along with the hybrid pre-selection of the explanatory attributes, reaches 77.19 % for the Czech dataset and 79.76 % for the Polish set respectively. The AUC values for these data sets are 0.81 and 0.87. The results obtained for the Polish data set have been compared to the already published articles by Tsai (2009) and Wang et al. (2014) who applied different classification algorithms. The method proposed in my study, when compared to the above earlier works, comes out as quite successful. The thesis also includes comparing various approaches as regards the discretisation of numerical attributes and selecting the relevant explanatory attributes. These are the key issues for increasing performance of the naive Bayes classifiers
APA, Harvard, Vancouver, ISO, and other styles
22

Al, Takash Ahmad. "Development of Numerical Methods to Accelerate the Prediction of the Behavior of Multiphysics under Cyclic Loading." Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2018. http://www.theses.fr/2018ESMA0014/document.

Full text
Abstract:
La réduction du temps de calcul lors de la résolution de problèmes d’évolution dans le cadre du calcul de structure constitue un enjeu majeur pour, par exemple, la mise en place de critères de rupture des pièces dans le secteur de l’aéronautique et de l’automobile. En particulier, la prédiction du cycle stabilisé des polymères sollicités sous chargement cyclique nécessite de résoudre un problème thermo-viscoélastique à grand nombre de cycles. La présence de différentes échelles de temps telles que le temps de relaxation (viscosité), le temps caractéristique associé au problème thermique et le temps du cycle de chargement conduit à un temps de calcul significatif lorsqu’un schéma incrémental est utilisé comme c’est le cas avec la méthode des éléments finis (MEF). De plus, un nombre important de données doit être stocké (au moins à chaque cycle). L’objectif de cette thèse est de proposer de nouvelles méthodes ainsi que d’étendre des méthodes existantes. Il est choisi de résoudre un problème thermique transitoire cyclique impliquant différentes échelles de temps avec l’objectif de réduire le temps de calcul réduit. Les méthodes proposées font partie des méthodes de réduction de modèles. Tout d’abord, la méthode de décomposition propre généralisée(PGD) a été étendue à un problème transitoire cyclique 3D non linéaire, la non-linéarité a été traitée en combinant la méthode PGD à la Méthode d’interpolation empirique discrète (DEIM), stratégie numérique déjà proposée dans la littérature. Les résultats ont montré l’efficacité de la PGD pour générer des résultats précis par rapport à la solution FEM avec une erreur relative inférieure à (1%). Ensuite, afin de réduire le temps de calcul, une autre approche alternative a été développée. Cette approche est basée sur l’utilisation d’une collection de modes, les modes les plus significatifs, issus de solutions PGD pour différentes échelles de temps et différentes valeurs de paramètres. Un dictionnaire regroupant ces modes est alors utilisé pour construire des solutions pour différents temps caractéristiques et différentes conditions aux limites, uniquement par projection de la solution sur les modes du dictionnaire. Cette approche a été adaptée pour traiter un problème faiblement couplé diffuso-thermique. La nouveauté de cette approche est de considérer un dictionnaire composé de bases spatio-temporelles et non pas uniquement de bases spatiales comme dans la fameuse méthode POD. Les résultats obtenus avec cette approche sont précis et permettent une réduction notable du temps de calcul on line. Néanmoins, lorsque différents temps de cycles sont considérés, le nombre de modes dans le dictionnaire augmente, ce qui en limite son utilisation. Afin de pallier cette limitation,une troisième stratégie numérique est proposée dans cette thèse. Elle consiste à considérer comme a priori connues des bases temporelles, elle est appelée stratégie mixte. L’originalité dans cette approche réside dans la construction de la base temporelle a prior basée sur l’analyse de Fourier de différentes simulations pour différents temps et différentes valeurs des paramètres. Une fois cette étude réalisée, une expression analytique des bases temporelles fonction des paramètres tels que le temps caractéristique et le temps du cycle est proposée. Les bases spatiales associées sont calculées à l’aide d’un algorithme type PGD. Cette méthode est ensuite testée pour la résolution de problèmes thermiques 3D sous chargement cyclique linéaires et non linéaires et un problème faiblement couplé thermo-diffusion
In the framework of structural calculation, the reduction of computation time plays an important rolein the proposition of failure criteria in the aeronautic and automobile domains. Particularly, the prediction of the stabilized cycle of polymer under cyclic loading requires solving of a thermo-viscoelastic problem with a high number of cycles. The presence of different time scales, such as relaxation time (viscosity), thermal characteristic time (thermal), and the cycle time (loading) lead to a huge computation time when an incremental scheme is used such as with the Finite Element Method (FEM).In addition, an allocation of memory will be used for data storage. The objective of this thesis isto propose new techniques and to extend existent ones. A transient thermal problem with different time scales is considered in the aim of computation time reduction. The proposed methods are called model reduction methods. First, the Proper Generalized Decomposition method (PGD) was extended to a nonlinear transient cyclic 3D problems. The non-linearity was considered by combining the PGD method with the Discrete Empirical Interpolation Method (DEIM), a numerical strategy used in the literature. Results showed the efficiency of the PGD in generating accurate results compared to the FEM solution with a relative error less than 1%. Then, a second approach was developed in order to reduce the computation time. It was based on the collection of the significant modes calculated from the PGD method for different time scales. A dictionary assembling these modes is then used to calculate the solution for different characteristic times and different boundary conditions. This approach was adapted in the case of a weak coupled diffusion thermal problem. The novelty of this method is to consider a dictionary composed of spatio-temporal bases and not spatial only as usedin the POD. The results showed again an exact reproduction of the solution in addition to a huge time reduction. However, when different cycle times are considered, the number of modes increases which limits the usage of the approach. To overcome this limitation, a third numerical strategy is proposed in this thesis. It consists in considering a priori known time bases and is called the mixed strategy. The originality in this approach lies in the construction of a priori time basis based on the Fourier analysis of different simulations for different time scales and different values of parameters.Once this study is done, an analytical expression of time bases based on parameters such as the characteristic time and the cycle time is proposed. The related spatial bases are calculated using the PGD algorithm. This method is then tested for the resolution of 3D thermal problems under cyclic loading linear and nonlinear and a weak coupled diffusion thermal problem
APA, Harvard, Vancouver, ISO, and other styles
23

Stretta, Jean-Michel. "Contribution de la teledetection aerospatiale a l'elaboration des bases de l'halieutique operationnelle : l'exemple des pecheries thonieres tropicales de surface (aspect predictif)." Paris 6, 1991. http://www.theses.fr/1991PA066346.

Full text
Abstract:
Apres avoir passe en revue les principaux parametres d'environnement et les facteurs biotiques ou abiotiques associes aux bancs de thons qui regissent l'habitat des thons, nous les preciserons en fonction de leurs besoins physiologiques. Les thons sont des predateurs actifs et leur alimentation va induire leur distribution au sein des bornes definies par les parametres physiques. Si les mouvements des thons sont difficilement observables depuis l'espace, nous les deduisons a partir de modeles s'appuyant sur leur comportement alimentaire. Pour rechercher des zones favorables aux thons nous savons qu'ils sont attires par des anomalies perceptibles dans leur environnement immediat et que ces dernieres sont a l'origine de zones a forte densite de nourriture. Dans les eaux tropicales, le systeme qui favorise l'enrichissement de la couche epipelagique est la remontee de la thermocline vers la surface. Ceci nous amene a developper un nouveau concept en ecologie des thons: celui du passe hydrologique d'une masse d'eau. Cette notion nous permet d'apprehender le probleme de la localisation de la nourriture en etudiant la signature thermique en surface des mecanismes de fertilisation des masses d'eaux (detectable par avion ou satellite equipes de radiometres infrarouge pour mesurer la temperature mde surface de la mer). Le modele le plus simple pour cerner la periode et la zone ou la probabilite de trouver des animaux proies sera la plus elevee consiste en une analyse (dite analyse praxeologique) de l'evolution dans l'espace et dans le temps de la temperature de surface. Le modele previsionnel previ-peche qui a ete developpe por prevoir, a l'attention des pecheurs des flottiles thonieres francaise, ivoirienne et senegalaise operant en atlantique, les zones favorables a la peche. Ce modele a demontre son efficacite car plus de 70% des coups de senne effectues par les flottilles sont valides par l'analyse des evolutions thermiques qui les precedent. Previ-peche a demontre que l'outil teledetection, meme rudimentaire, peut contribuer a la prevision de zones de peche thoniere. Toutefois, l'avenir des modeles previsionnels pour des activites de peche devrait prendre un nouvel essor avec l'utilisation des techniques de simulation du comportement animal
APA, Harvard, Vancouver, ISO, and other styles
24

Pinaire, Jessica. "Explorer les trajectoires de patients via les bases médico-économiques : application à l'infarctus du myocarde." Thesis, Montpellier, 2017. http://www.theses.fr/2017MONTS020/document.

Full text
Abstract:
Avec environ 120 000 personnes atteintes chaque année, 12 000 décès suite à la première crise et 18 000 décès après une année, l'infarctus du myocarde est un enjeu majeur de santé publique. Cette pathologie nécessite une hospitalisation et une prise en charge dans une unité de soins intensifs de cardiologie. Pour étudier cette pathologie, nous nous sommes orientés vers les bases hospitalières du PMSI.La collecte des données hospitalières dans le cadre du PMSI génère sur le plan national des bases de données de l'ordre de 25 millions d'enregistrements par an.Ces données, qui sont initialement recueillies à des fins médico-économiques, contiennent des informations qui peuvent avoir d'autres finalités : amélioration de la prise en charge du patient, prédiction de l'évolution des soins, planification de leurs coûts, etc.Ainsi émerge un autre enjeu : celui de fournir des outils d'explorations des trajectoires hospitalières des patients à partir des données issues du PMSI. Par le biais de plusieurs objectifs, les travaux menés dans le cadre de cette thèse ont pour vocation de proposer des outils combinant des méthodes issues de trois disciplines : informatique médicale, fouille de données et biostatistique.Nous apportons quatre contributions.La première contribution concerne la constitution d'une base de données de qualité pour analyser les trajectoires de patients. La deuxième contribution est une méthode semi-automatique pour la revue systématique de la littérature. Cette partie des travaux délimite les contours du concept de trajectoire dans le domaine biomédical. La troisième contribution est l'identification des parcours à risque dans la prédiction du décès intra-hospitalier. Notre stratégie de recherche s'articule en deux phases : 1) Identification de trajectoires types de patients à l'aide d'outils issus de la fouille de données ; 2) Construction d'un modèle de prédiction à partir de ces trajectoires afin de prédire le décès. Enfin, la dernière contribution est la caractérisation des flux de patients à travers les différents évènements hospitaliers mais aussi en termes de délais d'occurrences et de coûts de ces évènements. Dans cette partie, nous proposons à nouveau une alliance entre une méthode de fouille de données et de classification de données longitudinales
With approximately 120,000 people affected each year, 12,000 deaths from the first crisis and 18,000 deaths after one year, myocardial infarction is a major public health issue. This pathology requires hospitalization and management in an intensive care cardiology unit. We study this pathology using the French national Prospective Paiement System (PPS) databases.The collection of national hospital data within the framework of the PPS generates about 25 million records per year.These data, which are initially collected for medico-economic purposes, contain information that may have other purposes: improving patient care, predicting the evolution of care, planning their costs, etc.Another emerging issue is that of providing tools for exploring patients' hospital trajectories using data from the PPS. Through several objectives, this thesis aims to suggest tools combining methods from three disciplines: medical computing, data mining and biostatistics.We make four contributions.The first contribution concerns the constitution of a quality database to analyze patient trajectories. The second contribution is a semi-automatic method for the systematic review of the literature. This part of the work delineates the contours of the trajectory concept in the biomedical field. The third contribution is the identification of care trajectories in the prediction of intra-hospital death. Our research strategy is divided into two phases: 1) Identification of typical patient trajectories using data mining tools; 2) Construction of a prediction model from these trajectories to predict death. Finally, the last contribution is the characterization of patient flows through the various hospital events, also considering of delays and costs. In this contribution, we propose a combined-data mining and a longitudinal data clustering technique
APA, Harvard, Vancouver, ISO, and other styles
25

López, Massaguer Oriol 1972. "Development of informatic tools for extracting biomedical data from open and propietary data sources with predictive purposes." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/471540.

Full text
Abstract:
Hem desenvolupat noves eines de software per tal d’obtenir informació de fonts publiques i privades per tal de desenvolupar models de toxicitat in silico. La primera eina es Collector, una aplicació de programari lliure que genera series de compostos preparats per fer modelat QSAR anotats amb bioactivitats extretes de la plataforma Open PHACTS usant tecnologies de la web semàntica. Collector ha estat utilitzada dins el projecte eTOX per desenvolupar models predictius sobre endpoints de toxicitat. Addicionalment hem concebut, desenvolupat i implementat un mètode per derivar scorings de toxicitat apropiats per modelatge predictiu que utilitza les dades obtingudes de informes d’estudis amb dosis repetides in vivo de la industria farmacèutica. El nostre mètode ha estat testejant aplicant-lo al modelat de hepatotoxicitat obtenint les dades corresponents per 3 endpoints: ‘degenerative lesions’, ‘inflammatory liver changes’ and ‘non-neoplasic proliferative lesions’. S’ha validat la idoneïtat d’aquestes dades obtingudes comparant-les amb els valors de point of departure obtinguts experimentalment i també desenvolupant models QSAR de prova obtenint resultats acceptables. El nostre mètode es basa en la inferència basada en ontologies per extreure informació de la nostra base de dades on tenim dades anotades basades en ontologies. El nostre mètode també es pot aplicar a altres bases de dades amb informació preclínica per generar scorings de toxicitat. Addicionalment el nostre mètode d’inferència basat en ontologies es pot aplicar a d’altre bases de dades relacionals anotades amb ontologies.
We developed new software tools to obtain information from public and private data sources to develop in silico toxicity models. The first of these tools is Collector, an Open Source application that generates “QSAR-ready” series of compounds annotated with bioactivities, extracting the data from the Open PHACTS platform using semantic web technologies. Collector was applied in the framework of the eTOX project to develop predictive models for toxicity endpoints. Additionally, we conceived, designed, implemented and tested a method to derive toxicity scorings suitable for predictive modelling starting from in vivo preclinical repeated-dose studies generated by the pharmaceutical industry. This approach was tested by generating scorings for three hepatotoxicity endpoints: ‘degenerative lesions’, ‘inflammatory liver changes’ and ‘non-neoplasic proliferative lesions’. The suitability of these scores was tested by comparing them with experimentally obtained point of departure doses as well as by developing tentative QSAR models, obtaining acceptable results. Our method relies on ontology-based inference to extract information from our ontology annotated data stored in a relational database. Our method, as a whole, can be applied to other preclinical toxicity databases to generate toxicity scorings. Moreover, the ontology-based inference method on its own is applicable to any relational databases annotated with ontologies.
APA, Harvard, Vancouver, ISO, and other styles
26

Alborzi, Seyed Ziaeddin. "Automatic Discovery of Hidden Associations Using Vector Similarity : Application to Biological Annotation Prediction." Electronic Thesis or Diss., Université de Lorraine, 2018. http://www.theses.fr/2018LORR0035.

Full text
Abstract:
Cette thèse présente: 1) le développement d'une nouvelle approche pour trouver des associations directes entre des paires d'éléments liés indirectement à travers diverses caractéristiques communes, 2) l'utilisation de cette approche pour associer directement des fonctions biologiques aux domaines protéiques (ECDomainMiner et GODomainMiner) et pour découvrir des interactions domaine-domaine, et enfin 3) l'extension de cette approche pour annoter de manière complète à partir des domaines les structures et les séquences des protéines. Au total, 20 728 et 20 318 associations EC-Pfam et GO-Pfam non redondantes ont été découvertes, avec des F-mesures de plus de 0,95 par rapport à un ensemble de référence Gold Standard extrait d'une source d'associations connues (InterPro). Par rapport à environ 1500 associations déterminées manuellement dans InterPro, ECDomainMiner et GODomainMiner produisent une augmentation de 13 fois le nombre d'associations EC-Pfam et GO-Pfam disponibles. Ces associations domaine-fonction sont ensuite utilisées pour annoter des milliers de structures de protéines et des millions de séquences de protéines pour lesquelles leur composition de domaine est connue mais qui manquent actuellement d'annotations fonctionnelles. En utilisant des associations de domaines ayant acquis des annotations fonctionnelles inférées, et en tenant compte des informations de taxonomie, des milliers de règles d'annotation ont été générées automatiquement. Ensuite, ces règles ont été utilisées pour annoter des séquences de protéines dans la base de données TrEMBL
This thesis presents: 1) the development of a novel approach to find direct associations between pairs of elements linked indirectly through various common features, 2) the use of this approach to directly associate biological functions to protein domains (ECDomainMiner and GODomainMiner), and to discover domain-domain interactions, and finally 3) the extension of this approach to comprehensively annotate protein structures and sequences. ECDomainMiner and GODomainMiner are two applications to discover new associations between EC Numbers and GO terms to protein domains, respectively. They find a total of 20,728 and 20,318 non-redundant EC-Pfam and GO-Pfam associations, respectively, with F-measures of more than 0.95 with respect to a “Gold Standard” test set extracted from InterPro. Compared to around 1500 manually curated associations in InterPro, ECDomainMiner and GODomainMiner infer a 13-fold increase in the number of available EC-Pfam and GO-Pfam associations. These function-domain associations are then used to annotate thousands of protein structures and millions of protein sequences for which their domain composition is known but that currently lack experimental functional annotations. Using inferred function-domain associations and considering taxonomy information, thousands of annotation rules have automatically been generated. Then, these rules have been utilized to annotate millions of protein sequences in the TrEMBL database
APA, Harvard, Vancouver, ISO, and other styles
27

Mbaye, Ndèye Maguette. "Multimodal learning to predict breast cancer prognosis." Electronic Thesis or Diss., Université Paris sciences et lettres, 2024. http://www.theses.fr/2024UPSLM017.

Full text
Abstract:
Le cancer du sein est l’un des cancers les plus fréquents dans le monde, représentant 12,5 % des nouveaux cas annuels. En 2022, environ 2,3 millions de femmes ont été diagnostiquées, avec plus de 666 000 décès. Bien que les dossiers médicaux électroniques (DME) aient révolutionné la recherche clinique en fournissant des données précieuses, les études sur le cancer du sein exploitent rarement les rapports médicaux en texte libre, qui contiennent pourtant des informations cruciales. Cette thèse propose de développer des modèles d’apprentissage automatique et profond pour prédire les statuts de survie du cancer du sein en utilisant des données multimodales (rapports textuels en français, résultats de laboratoire et descripteurs cliniques) d’une vaste cohorte de l’Institut Curie. Des modèles ont été construits pour analyser séparément puis conjointement ces modalités. Les résultats montrent que l’intégration des données textuelles et structurées améliore la prédiction des statuts de survie des patientes. De plus, l'analyse des facteurs prédictifs des statuts de survie des patients ouvre de nouvelles perspectives pour une meilleure compréhension des mécanismes du cancer du sein et par conséquent, l’amélioration des soins
Breast cancer is one of the most common cancers worldwide, accounting for 12.5% of new cases each year. In 2022, around 2.3 million women were diagnosed, with over 666,000 deaths. Although electronic health records (EHRs) have revolutionized clinical research by providing valuable data, breast cancer studies rarely exploit free-text medical reports, which nonetheless contain crucial information. This thesis proposes to develop machine and deep learning models to predict breast cancer outcomes using multimodal data (French text reports, laboratory results, clinical descriptors) from a large Institut Curie cohort. Models were built to analyze these modalities separately and then jointly. Results show that the integration of textual and structured data improves the prediction of patients' survival status. Moreover, the analy-sis of predictive factors for patients' survival status opens up new perspectives for a better understanding of underlying mechanisms in breast cancer, and thus, for improving care
APA, Harvard, Vancouver, ISO, and other styles
28

Alborzi, Seyed Ziaeddin. "Automatic Discovery of Hidden Associations Using Vector Similarity : Application to Biological Annotation Prediction." Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0035/document.

Full text
Abstract:
Cette thèse présente: 1) le développement d'une nouvelle approche pour trouver des associations directes entre des paires d'éléments liés indirectement à travers diverses caractéristiques communes, 2) l'utilisation de cette approche pour associer directement des fonctions biologiques aux domaines protéiques (ECDomainMiner et GODomainMiner) et pour découvrir des interactions domaine-domaine, et enfin 3) l'extension de cette approche pour annoter de manière complète à partir des domaines les structures et les séquences des protéines. Au total, 20 728 et 20 318 associations EC-Pfam et GO-Pfam non redondantes ont été découvertes, avec des F-mesures de plus de 0,95 par rapport à un ensemble de référence Gold Standard extrait d'une source d'associations connues (InterPro). Par rapport à environ 1500 associations déterminées manuellement dans InterPro, ECDomainMiner et GODomainMiner produisent une augmentation de 13 fois le nombre d'associations EC-Pfam et GO-Pfam disponibles. Ces associations domaine-fonction sont ensuite utilisées pour annoter des milliers de structures de protéines et des millions de séquences de protéines pour lesquelles leur composition de domaine est connue mais qui manquent actuellement d'annotations fonctionnelles. En utilisant des associations de domaines ayant acquis des annotations fonctionnelles inférées, et en tenant compte des informations de taxonomie, des milliers de règles d'annotation ont été générées automatiquement. Ensuite, ces règles ont été utilisées pour annoter des séquences de protéines dans la base de données TrEMBL
This thesis presents: 1) the development of a novel approach to find direct associations between pairs of elements linked indirectly through various common features, 2) the use of this approach to directly associate biological functions to protein domains (ECDomainMiner and GODomainMiner), and to discover domain-domain interactions, and finally 3) the extension of this approach to comprehensively annotate protein structures and sequences. ECDomainMiner and GODomainMiner are two applications to discover new associations between EC Numbers and GO terms to protein domains, respectively. They find a total of 20,728 and 20,318 non-redundant EC-Pfam and GO-Pfam associations, respectively, with F-measures of more than 0.95 with respect to a “Gold Standard” test set extracted from InterPro. Compared to around 1500 manually curated associations in InterPro, ECDomainMiner and GODomainMiner infer a 13-fold increase in the number of available EC-Pfam and GO-Pfam associations. These function-domain associations are then used to annotate thousands of protein structures and millions of protein sequences for which their domain composition is known but that currently lack experimental functional annotations. Using inferred function-domain associations and considering taxonomy information, thousands of annotation rules have automatically been generated. Then, these rules have been utilized to annotate millions of protein sequences in the TrEMBL database
APA, Harvard, Vancouver, ISO, and other styles
29

Schaberreiter, T. (Thomas). "A Bayesian network based on-line risk prediction framework for interdependent critical infrastructures." Doctoral thesis, Oulun yliopisto, 2013. http://urn.fi/urn:isbn:9789526202129.

Full text
Abstract:
Abstract Critical Infrastructures (CIs) are an integral part of our society and economy. Services like electricity supply or telecommunication services are expected to be available at all times and a service failure may have catastrophic consequences for society or economy. Current CI protection strategies are from a time when CIs or CI sectors could be operated more or less self-sufficient and interconnections among CIs or CI sectors, which may lead to cascading service failures to other CIs or CI sectors, where not as omnipresent as today. In this PhD thesis, a cross-sector CI model for on-line risk monitoring of CI services, called CI security model, is presented. The model allows to monitor a CI service risk and to notify services that depend on it of possible risks in order to reduce and mitigate possible cascading failures. The model estimates CI service risk by observing the CI service state as measured by base measurements (e.g. sensor or software states) within the CI service components and by observing the experienced service risk of CI services it depends on (CI service dependencies). CI service risk is estimated in a probabilistic way using a Bayesian network based approach. Furthermore, the model allows CI service risk prediction in the short-term, mid-term and long-term future, given a current CI service risk and it allows to model interdependencies (a CI service risk that loops back to the originating service via dependencies), a special case that is difficult to model using Bayesian networks. The representation of a CI as a CI security model requires analysis. In this PhD thesis, a CI analysis method based on the PROTOS-MATINE dependency analysis methodology is presented in order to analyse CIs and represent them as CI services, CI service dependencies and base measurements. Additional research presented in this PhD thesis is related to a study of assurance indicators able to perform an on-line evaluation of the correctness of risk estimates within a CI service, as well as for risk estimates received from dependencies. A tool that supports all steps of establishing a CI security model was implemented during this PhD research. The research on the CI security model and the assurance indicators was validated based on a case study and the initial results suggest its applicability to CI environments
Tiivistelmä Tässä väitöskirjassa esitellään läpileikkausmalli kriittisten infrastruktuurien jatkuvaan käytön riskimallinnukseen. Tämän mallin avulla voidaan tiedottaa toisistaan riippuvaisia palveluita mahdollisista vaaroista, ja siten pysäyttää tai hidastaa toisiinsa vaikuttavat ja kumuloituvat vikaantumiset. Malli analysoi kriittisen infrastruktuurin palveluriskiä tutkimalla kriittisen infrastruktuuripalvelun tilan, joka on mitattu perusmittauksella (esimerkiksi anturi- tai ohjelmistotiloina) kriittisen infrastruktuurin palvelukomponenttien välillä ja tarkkailemalla koetun kriittisen infrastruktuurin palveluriskiä, joista palvelut riippuvat (kriittisen infrastruktuurin palveluriippuvuudet). Kriittisen infrastruktuurin palveluriski arvioidaan todennäköisyyden avulla käyttämällä Bayes-verkkoja. Lisäksi malli mahdollistaa tulevien riskien ennustamisen lyhyellä, keskipitkällä ja pitkällä aikavälillä, ja mahdollistaa niiden keskinäisten riippuvuuksien mallintamisen, joka on yleensä vaikea esittää Bayes-verkoissa. Kriittisen infrastruktuurin esittäminen kriittisen infrastruktuurin tietoturvamallina edellyttää analyysiä. Tässä väitöskirjassa esitellään kriittisen infrastruktuurin analyysimenetelmä, joka perustuu PROTOS-MATINE -riippuvuusanalyysimetodologiaan. Kriittiset infrastruktuurit esitetään kriittisen infrastruktuurin palveluina, palvelujen keskinäisinä riippuvuuksina ja perusmittauksina. Lisäksi tutkitaan varmuusindikaattoreita, joilla voidaan tutkia suoraan toiminnassa olevan kriittisen infrastruktuuripalvelun riskianalyysin oikeellisuutta, kuin myös riskiarvioita riippuvuuksista. Tutkimuksessa laadittiin työkalu, joka tukee kriittisen infrastruktuurin tietoturvamallin toteuttamisen kaikkia vaiheita. Kriittisen infrastruktuurin tietoturvamalli ja varmuusindikaattorien oikeellisuus vahvistettiin konseptitutkimuksella, ja alustavat tulokset osoittavat menetelmän toimivuuden
Kurzfassung In dieser Doktorarbeit wird ein Sektorübergreifendes Modell für die kontinuierliche Risikoabschätzung von kritische Infrastrukturen im laufenden Betrieb vorgestellt. Das Modell erlaubt es, Dienstleistungen, die in Abhängigkeit einer anderen Dienstleistung stehen, über mögliche Gefahren zu informieren und damit die Gefahr des Übergriffs von Risiken in andere Teile zu stoppen oder zu minimieren. Mit dem Modell können Gefahren in einer Dienstleistung anhand der Überwachung von kontinuierlichen Messungen (zum Beispiel Sensoren oder Softwarestatus) sowie der Überwachung von Gefahren in Dienstleistungen, die eine Abhängigkeit darstellen, analysiert werden. Die Abschätzung von Gefahren erfolgt probabilistisch mittels eines Bayessches Netzwerks. Zusätzlich erlaubt dieses Modell die Voraussage von zukünftigen Risiken in der kurzfristigen, mittelfristigen und langfristigen Zukunft und es erlaubt die Modellierung von gegenseitigen Abhängigkeiten, die im Allgemeinen schwer mit Bayesschen Netzwerken darzustellen sind. Um eine kritische Infrastruktur als ein solches Modell darzustellen, muss eine Analyse der kritischen Infrastruktur durchgeführt werden. In dieser Doktorarbeit wird diese Analyse durch die PROTOS-MATINE Methode zur Analyse von Abhängigkeiten unterstützt. Zusätzlich zu dem vorgestellten Modell wird in dieser Doktorarbeit eine Studie über Indikatoren, die das Vertrauen in die Genauigkeit einer Risikoabschätzung evaluieren können, vorgestellt. Die Studie beschäftigt sich sowohl mit der Evaluierung von Risikoabschätzungen innerhalb von Dienstleistungen als auch mit der Evaluierung von Risikoabschätzungen, die von Dienstleistungen erhalten wurden, die eine Abhängigkeiten darstellen. Eine Software, die alle Aspekte der Erstellung des vorgestellten Modells unterstützt, wurde entwickelt. Sowohl das präsentierte Modell zur Abschätzung von Risiken in kritischen Infrastrukturen als auch die Indikatoren zur Uberprüfung der Risikoabschätzungen wurden anhand einer Machbarkeitsstudie validiert. Erste Ergebnisse suggerieren die Anwendbarkeit dieser Konzepte auf kritische Infrastrukturen
APA, Harvard, Vancouver, ISO, and other styles
30

Pichot, François. "Développement d’une méthode numérique pour la prédiction des dimensions d’un cordon de soudure tig : application aux superalliages bases cobalt et nickel." Thesis, Bordeaux 1, 2012. http://www.theses.fr/2012BOR14489/document.

Full text
Abstract:
Le procédé de soudage TIG est actuellement le plus utilisé dans l’industrie aéronautique du fait de la qualité des joints de soudure qu’il permet d’obtenir et de sa simplicité d’automatisation. Une opération de soudage provoque des gradients thermiques dus au passage de la source de chaleur sur la pièce qui induisent des déformations et des contraintes résiduelles pouvant impacter la durée de vie de l’assemblage. Ce travail vise à mettre en place un modèle de simulation de ce procédé dans le but d’optimiser les paramètres opératoires.Avant d’envisager un couplage thermomécanique, il convient de représenter convenablement les transferts thermiques au cours du soudage et en particulier l’apport de chaleur lié au procédé. Dans cette étude, on propose une source de chaleur prédictive simplifiée représentative des paramètres opératoires qui permet en particulier d’estimer les dimensions caractéristiques du cordon de soudure et de traduire fidèlement l’évolution thermique dans la pièce. Cette source est définie par un flux de chaleur homogène dépendant d’une puissance P, réparti sur un disque de rayon R, ces 2 paramètres numériques étant liés aux principaux paramètres opératoires de soudage que sont l’intensité I et la hauteur d’arc h.Une campagne d’essais expérimentaux dans laquelle on étudie les variations des dimensions de la Zone Fondue (ZF) pour des cas non pénétrants et pénétrants en fonction des paramètres opératoires (I, h) est présentée. Pour chaque essai, un couple de paramètres d’entrée de la source de chaleur (P, R) permettant de reproduire les dimensions du bain fondu est identifié. La confrontation des résultats obtenus numériquement et expérimentalement permet de mettre en place des relations entre les paramètres opératoires de soudage (I, h) et les paramètres numériques (P, R) conférant un caractère prédictif à la source de chaleur. Ce modèle de source a été validé pour différentes configurations de soudage en termes d’épaisseurs de tôles, de matériaux à assembler, de vitesses d’avance de la torche, ...Notre modèle thermique a ensuite servi de base pour la simulation thermomécanique du procédé. Le modèle est appliqué à l’assemblage de deux composants d’un turbomoteur en superalliage base Nickel
Gas Tungsten Arc Welding (GTAW) is the most widely used welding process in aeronautics, due to its weld quality. During a welding operation, the thermal source induces thermal gradients causing strains and stresses that could affect assembly’s life duration. The aim of this study is to develop a numerical model of the welding process in order to get optimized process parameters.Before coupling thermal and mechanical phenomena, we must modelize heat transfers during welding. We propose a simplified heat source linked to the process parameters which enables to predict the main dimensions of the weld pool and the thermal evolution in the solid part. This source is defined by an homogeneous heat flux depending on a power P distributed in a R radius disk. These two parameters relate to process parameters, the arc height (h) and the current intensity I.Experiment tests was achieved to study the weld pool dimensions for both cases : incomplete penetration and full penetration weld. For each test, we identified the heat source parameters (P, R) which allow to obtain the experimental weld pool dimensions. The confrontation of numerical and experimental results enables to get links between the heat source parameters (P, R) and the welding parameters (I, h), producing a predictive heat source. The heat source reliability was verified taking into account several welding configurations with various superalloys sheet thickness, welding speed, materials.A coupled thermal-mechanical analysis, based on our thermal model, was applied to an industrial case: a nickel based superalloy components assembly of a gas turbine
APA, Harvard, Vancouver, ISO, and other styles
31

Derras, Boumédiène. "Estimation des mouvements sismiques et de leur variabilité par approche neuronale : Apport à la compréhension des effets de la source, de propagation et de site." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAU013/document.

Full text
Abstract:
Cette thèse est consacrée à une analyse approfondie de la capacité des "réseaux de neurones artificiels" (RNA) à la prédiction des mouvements sismiques. Un premier volet important concerne la dérivation par RNA de "GMPE" (équations de prédiction du mouvement du sol) et la comparaison des performances ainsi obtenues avec celles des GMPE "classiques" obtenues sur la base de régressions empiriques avec une forme fonctionnelle préétablie (plus ou moins complexe). Pour effectuer l’étude comparative et obtenir les deux composnates inter-événement « betweeen-event » et intra-événement « within-event » de la variabilité aléatoire, nous intégrons l’algorithme du « modèle à effets aléatoires » à l’approche neuronale. Cette approche est testée sur différents jeux de données réelles et synthétiques : la base de données compilée à partir d'événements européens, méditerranéens et du Moyen-Orient (RESORCE : Reference database for Seismic grOund-motion pRediction in Europe), la base de données NGA-West 2 (Next Generation Attenuation West 2 développée aux USA), la base de données japonaise dérivée du réseau accélérométrique KiK-net. En outre, un set de données synthétiques provenant d'une approche par simulation stochastique est utilisé. Les paramètres du mouvement du sol les plus utilisés en génie parasismique (PGA, PGV, spectres de réponse et également, dans certains cas, les fonctions d'amplification locales) sont considérés. Les modèles neuronaux ainsi obtenus, complètement dirigés par les données « data-driven », nous renseignent sur les influences respectives et éventuellement couplées de l’atténuation avec la distance, de l'effet d’échelle lié à la magnitude, des conditions de site et notamment la présence éventuelle de non-linéarités. Un autre volet important est consacré à l'utilisation des RNA pour tester la pertinence de différents proxies de site, au travers de leur capacité à réduire la variabilité aléatoire des prédictions de mouvement du sol. Utilisés individuellement ou en couple, ces proxies de site décrivent de manière plus ou moins détaillée l'influence des conditions de site locales sur le mouvement sismique. Dans ce même volet, nous amorçons également une étude des liens entre les aspects non-linéaire de la réponse de site, et les différents proxies de site. Le troisième volet se concentre sur certain effets liés à la source : analyse de l’influence du style de la faille sismique sur le mouvement du sol, ainsi qu'une approche indirecte de la dépendance entre la magnitude et la chute de contrainte sismique
This thesis is devoted to an in-depth analysis of the ability of "Artificial Neural Networks" (ANN) to achieve reliable ground motion predictions. A first important aspect concerns the derivation of "GMPE" (Ground Motion Prediction Equations) with an ANN approach, and the comparison of their performance with those of "classical" GMGEs derived on the basis of empirical regressions with pre-established, more or less complex, functional forms. To perform such a comparison involving the two "betweeen-event" and "within-event" components of the random variability, we adapt the algorithm of the "random effects model" to the neural approach. This approach is tested on various, real and synthetic, datasets: the database compiled from European, Mediterranean and Middle Eastern events (RESORCE: Reference database for Seismic grOund-motion pRediction in Europe), the database NGA West 2 (Next Generation Attenuation West 2 developed in the USA), and the Japanese database derived from the KiK-net accelerometer network. In addition, a comprehensive set of synthetic data is also derived with a stochastic simulation approach. The considered ground motion parameters are those which are most used in earthquake engineering (PGA, PGV, response spectra and also, in some cases, local amplification functions). Such completely "data-driven" neural models, inform us about the respective, and possibly coupled, influences of the amplitude decay with distance, the magnitude scaling effects, and the site conditions, with a particular focus on the detection of non-linearities in site response. Another important aspect is the use of ANNs to test the relevance of different site proxies, through their ability to reduce the random variability of ground motion predictions. The ANN approach allows to use such site proxies either individually or combined, and to investigate their respective impact on the various characteristics of ground motion. The same section also includes an investigation on the links between the non-linear aspects of the site response and the different site proxies. Finally, the third section focuses on a few source-related effects: analysis of the influence of the "style of faulting" on ground motion, and, indirectly, the dependence between magnitude and seismic stress drop
APA, Harvard, Vancouver, ISO, and other styles
32

Sengupta, Aritra. "Empirical Hierarchical Modeling and Predictive Inference for Big, Spatial, Discrete, and Continuous Data." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1350660056.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Petřík, Patrik. "Predikce vývoje akciového trhu prostřednictvím technické a psychologické analýzy." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2010. http://www.nusl.cz/ntk/nusl-222507.

Full text
Abstract:
This work deals with stock market prediction via technical and psychological analysis. We introduce theoretical resources of technical and psychological analysis. We also introduce some methods of artificial intelligence, specially neural networks and genetic algorithms. We design a system for stock market prediction. We implement and test a part of system. In conclusion we discuss results.
APA, Harvard, Vancouver, ISO, and other styles
34

Koseler, Kaan Tamer. "Realization of Model-Driven Engineering for Big Data: A Baseball Analytics Use Case." Miami University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=miami1524832924255132.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Šenovský, Jakub. "Dolování z dat v jazyce Python." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363895.

Full text
Abstract:
The main goal of this thesis was to get acquainted with the phases of data mining, with the support of the programming languages Python and R in the field of data mining and demonstration of their use in two case studies. The comparison of these languages in the field of data mining is also included. The data preprocessing phase and the mining algorithms for classification, prediction and clustering are described here. There are illustrated the most significant libraries for Python and R. In the first case study, work with time series was demonstrated using the ARIMA model and Neural Networks with precision verification using a Mean Square Error. In the second case study, the results of football matches are classificated using the K - Nearest Neighbors, Bayes Classifier, Random Forest and Logical Regression. The precision of the classification is displayed using Accuracy Score and Confusion Matrix. The work is concluded with the evaluation of the achived results and suggestions for the future improvement of the individual models.
APA, Harvard, Vancouver, ISO, and other styles
36

Hrach, Vlastimil. "Využití prostředků umělé inteligence na kapitálových trzích." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2011. http://www.nusl.cz/ntk/nusl-222912.

Full text
Abstract:
The diploma thesis deals with artificial intelligence utilization for predictions on stock markets.The prediction is unconventionally based on Bayes' probabilistic model theorem and on its based Naive Bayes classifier. I the practical part algorithm is designed. The algorithm uses recognized relations between identifiers of technical analyze. Concretely exponential running averages at 20 and 50 days had been used. The program output is a graphic forecast of future stock development which is designed on ground of relations classification between the identifiers
APA, Harvard, Vancouver, ISO, and other styles
37

Carvalho, Jo?o Batista. "Predi??o em modelos de tempo de falha acelerado com efeito aleat?rio para avalia??o de riscos de falha em po?os petrol?feros." Universidade Federal do Rio Grande do Norte, 2010. http://repositorio.ufrn.br:8080/jspui/handle/123456789/18635.

Full text
Abstract:
Made available in DSpace on 2015-03-03T15:28:31Z (GMT). No. of bitstreams: 1 JoaoBC_DISSERT_partes_autorizadas.pdf: 252147 bytes, checksum: e830f27faffa86c9087da28e43e699fd (MD5) Previous issue date: 2010-05-28
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior
We considered prediction techniques based on models of accelerated failure time with random e ects for correlated survival data. Besides the bayesian approach through empirical Bayes estimator, we also discussed about the use of a classical predictor, the Empirical Best Linear Unbiased Predictor (EBLUP). In order to illustrate the use of these predictors, we considered applications on a real data set coming from the oil industry. More speci - cally, the data set involves the mean time between failure of petroleum-well equipments of the Bacia Potiguar. The goal of this study is to predict the risk/probability of failure in order to help a preventive maintenance program. The results show that both methods are suitable to predict future failures, providing good decisions in relation to employment and economy of resources for preventive maintenance.
Consideramos t?cnicas de predi??o baseadas em modelos de tempo de falha acelerado com efeito aleat?rio para dados de sobreviv?ncia correlacionados. Al?m do enfoque bayesiano atrav?s do Estimador de Bayes Emp?rico, tamb?m discutimos sobre o uso de um m?todo cl?ssico, o Melhor Preditor Linear N?o Viciado Emp?rico (EBLUP), nessa classe de modelos. Para ilustrar a utilidade desses m?todos, fazemos aplica??es a um conjunto de dados reais envolvendo tempos entre falhas de equipamentos de po?os de petr?leo da Bacia Potiguar. Neste contexto, o objetivo ? predizer os riscos/probabilidades de falha com a finalidade de subsidiar programas de manuten??o preventiva. Os resultados obtidos mostram que ambos os m?todos s?o adequados para prever falhas futuras, proporcionando boas decis?es em rela??o ao emprego e economia de recursos para manuten??o preventiva
APA, Harvard, Vancouver, ISO, and other styles
38

Hellsing, Edvin, and Joel Klingberg. "It’s a Match: Predicting Potential Buyers of Commercial Real Estate Using Machine Learning." Thesis, Uppsala universitet, Institutionen för informatik och media, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445229.

Full text
Abstract:
This thesis has explored the development and potential effects of an intelligent decision support system (IDSS) to predict potential buyers for commercial real estate property. The overarching need for an IDSS of this type has been identified exists due to information overload, which the IDSS aims to reduce. By shortening the time needed to process data, time can be allocated to make sense of the environment with colleagues. The system architecture explored consisted of clustering commercial real estate buyers into groups based on their characteristics, and training a prediction model on historical transaction data from the Swedish market from the cadastral and land registration authority. The prediction model was trained to predict which out of the cluster groups most likely will buy a given property. For the clustering, three different clustering algorithms were used and evaluated, one density based, one centroid based and one hierarchical based. The best performing clustering model was the centroid based (K-means). For the predictions, three supervised Machine learning algorithms were used and evaluated. The different algorithms used were Naive Bayes, Random Forests and Support Vector Machines. The model based on Random Forests performed the best, with an accuracy of 99.9%.
Denna uppsats har undersökt utvecklingen av och potentiella effekter med ett intelligent beslutsstödssystem (IDSS) för att prediktera potentiella köpare av kommersiella fastigheter. Det övergripande behovet av ett sådant system har identifierats existerar på grund av informtaionsöverflöd, vilket systemet avser att reducera. Genom att förkorta bearbetningstiden av data kan tid allokeras till att skapa förståelse av omvärlden med kollegor. Systemarkitekturen som undersöktes bestod av att gruppera köpare av kommersiella fastigheter i kluster baserat på deras köparegenskaper, och sedan träna en prediktionsmodell på historiska transkationsdata från den svenska fastighetsmarknaden från Lantmäteriet. Prediktionsmodellen tränades på att prediktera vilken av grupperna som mest sannolikt kommer köpa en given fastighet. Tre olika klusteralgoritmer användes och utvärderades för grupperingen, en densitetsbaserad, en centroidbaserad och en hierarkiskt baserad. Den som presterade bäst var var den centroidbaserade (K-means). Tre övervakade maskininlärningsalgoritmer användes och utvärderades för prediktionerna. Dessa var Naive Bayes, Random Forests och Support Vector Machines. Modellen baserad p ̊a Random Forests presterade bäst, med en noggrannhet om 99,9%.
APA, Harvard, Vancouver, ISO, and other styles
39

González, Rubio Jesús. "On the effective deployment of current machine translation technology." Doctoral thesis, Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/37888.

Full text
Abstract:
Machine translation is a fundamental technology that is gaining more importance each day in our multilingual society. Companies and particulars are turning their attention to machine translation since it dramatically cuts down their expenses on translation and interpreting. However, the output of current machine translation systems is still far from the quality of translations generated by human experts. The overall goal of this thesis is to narrow down this quality gap by developing new methodologies and tools that improve the broader and more efficient deployment of machine translation technology. We start by proposing a new technique to improve the quality of the translations generated by fully-automatic machine translation systems. The key insight of our approach is that different translation systems, implementing different approaches and technologies, can exhibit different strengths and limitations. Therefore, a proper combination of the outputs of such different systems has the potential to produce translations of improved quality. We present minimum Bayes¿ risk system combination, an automatic approach that detects the best parts of the candidate translations and combines them to generate a consensus translation that is optimal with respect to a particular performance metric. We thoroughly describe the formalization of our approach as a weighted ensemble of probability distributions and provide efficient algorithms to obtain the optimal consensus translation according to the widespread BLEU score. Empirical results show that the proposed approach is indeed able to generate statistically better translations than the provided candidates. Compared to other state-of-the-art systems combination methods, our approach reports similar performance not requiring any additional data but the candidate translations. Then, we focus our attention on how to improve the utility of automatic translations for the end-user of the system. Since automatic translations are not perfect, a desirable feature of machine translation systems is the ability to predict at run-time the quality of the generated translations. Quality estimation is usually addressed as a regression problem where a quality score is predicted from a set of features that represents the translation. However, although the concept of translation quality is intuitively clear, there is no consensus on which are the features that actually account for it. As a consequence, quality estimation systems for machine translation have to utilize a large number of weak features to predict translation quality. This involves several learning problems related to feature collinearity and ambiguity, and due to the ¿curse¿ of dimensionality. We address these challenges by adopting a two-step training methodology. First, a dimensionality reduction method computes, from the original features, the reduced set of features that better explains translation quality. Then, a prediction model is built from this reduced set to finally predict the quality score. We study various reduction methods previously used in the literature and propose two new ones based on statistical multivariate analysis techniques. More specifically, the proposed dimensionality reduction methods are based on partial least squares regression. The results of a thorough experimentation show that the quality estimation systems estimated following the proposed two-step methodology obtain better prediction accuracy that systems estimated using all the original features. Moreover, one of the proposed dimensionality reduction methods obtained the best prediction accuracy with only a fraction of the original features. This feature reduction ratio is important because it implies a dramatic reduction of the operating times of the quality estimation system. An alternative use of current machine translation systems is to embed them within an interactive editing environment where the system and a human expert collaborate to generate error-free translations. This interactive machine translation approach have shown to reduce supervision effort of the user in comparison to the conventional decoupled post-edition approach. However, interactive machine translation considers the translation system as a passive agent in the interaction process. In other words, the system only suggests translations to the user, who then makes the necessary supervision decisions. As a result, the user is bound to exhaustively supervise every suggested translation. This passive approach ensures error-free translations but it also demands a large amount of supervision effort from the user. Finally, we study different techniques to improve the productivity of current interactive machine translation systems. Specifically, we focus on the development of alternative approaches where the system becomes an active agent in the interaction process. We propose two different active approaches. On the one hand, we describe an active interaction approach where the system informs the user about the reliability of the suggested translations. The hope is that this information may help the user to locate translation errors thus improving the overall translation productivity. We propose different scores to measure translation reliability at the word and sentence levels and study the influence of such information in the productivity of an interactive machine translation system. Empirical results show that the proposed active interaction protocol is able to achieve a large reduction in supervision effort while still generating translations of very high quality. On the other hand, we study an active learning framework for interactive machine translation. In this case, the system is not only able to inform the user of which suggested translations should be supervised, but it is also able to learn from the user-supervised translations to improve its future suggestions. We develop a value-of-information criterion to select which automatic translations undergo user supervision. However, given its high computational complexity, in practice we study different selection strategies that approximate this optimal criterion. Results of a large scale experimentation show that the proposed active learning framework is able to obtain better compromises between the quality of the generated translations and the human effort required to obtain them. Moreover, in comparison to a conventional interactive machine translation system, our proposal obtained translations of twice the quality with the same supervision effort.
González Rubio, J. (2014). On the effective deployment of current machine translation technology [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37888
TESIS
APA, Harvard, Vancouver, ISO, and other styles
40

García, Durán Alberto. "Learning representations in multi-relational graphs : algorithms and applications." Thesis, Compiègne, 2016. http://www.theses.fr/2016COMP2271/document.

Full text
Abstract:
Internet offre une énorme quantité d’informations à portée de main et dans une telle variété de sujets, que tout le monde est en mesure d’accéder à une énorme variété de connaissances. Une telle grande quantité d’information pourrait apporter un saut en avant dans de nombreux domaines (moteurs de recherche, réponses aux questions, tâches NLP liées) si elle est bien utilisée. De cette façon, un enjeu crucial de la communauté d’intelligence artificielle a été de recueillir, d’organiser et de faire un usage intelligent de cette quantité croissante de connaissances disponibles. Heureusement, depuis un certain temps déjà des efforts importants ont été faits dans la collecte et l’organisation des connaissances, et beaucoup d’informations structurées peuvent être trouvées dans des dépôts appelés Bases des Connaissances (BCs). Freebase, Entity Graph Facebook ou Knowledge Graph de Google sont de bons exemples de BCs. Un grand problème des BCs c’est qu’ils sont loin d’êtres complets. Par exemple, dans Freebase seulement environ 30% des gens ont des informations sur leur nationalité. Cette thèse présente plusieurs méthodes pour ajouter de nouveaux liens entre les entités existantes de la BC basée sur l’apprentissage des représentations qui optimisent une fonction d’énergie définie. Ces modèles peuvent également être utilisés pour attribuer des probabilités à triples extraites du Web. On propose également une nouvelle application pour faire usage de cette information structurée pour générer des informations non structurées (spécifiquement des questions en langage naturel). On pense par rapport à ce problème comme un modèle de traduction automatique, où on n’a pas de langage correct comme entrée, mais un langage structuré. Nous adaptons le RNN codeur-décodeur à ces paramètres pour rendre possible cette traduction
Internet provides a huge amount of information at hand in such a variety of topics, that now everyone is able to access to any kind of knowledge. Such a big quantity of information could bring a leap forward in many areas if used properly. This way, a crucial challenge of the Artificial Intelligence community has been to gather, organize and make intelligent use of this growing amount of available knowledge. Fortunately, important efforts have been made in gathering and organizing knowledge for some time now, and a lot of structured information can be found in repositories called Knowledge Bases (KBs). A main issue with KBs is that they are far from being complete. This thesis proposes several methods to add new links between the existing entities of the KB based on the learning of representations that optimize some defined energy function. We also propose a novel application to make use of this structured information to generate questions in natural language
APA, Harvard, Vancouver, ISO, and other styles
41

Bogadhi, Amarender R. "Une étude expérimentale et théorique de l'intégration de mouvement pour la poursuite lente : Un modèle Bayesien récurrent et hiérarchique." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM5009/document.

Full text
Abstract:
Cette thèse se compose de deux parties, concernant deux études expérimentales sur les mouvements oculaires de poursuite lente d'un stimulus visuel en mouvement (barre inclinée). La première étude aborde l'intégration dynamique de signaux locaux de mouvement visuel provenant de la rétine, tandis que la seconde porte sur l'influence de signaux extra-rétiniens sur l'intégration du mouvement. Un cadre théorique plus général est également proposé, sur la base d'un modèle bayésien récurrent et hiérarchique pour la poursuite lente. Pour la première étude, l'intégration dynamique de mouvement a été analysée en variant le contraste et la vitesse de la barre inclinée. Les résultats montrent que des vitesses plus élevées et des valeurs plus basses de contraste produisent un plus fort biais dans la direction initiale de poursuite et que successivement la dynamique d'intégration de mouvement est plus lente pour les contrastes faibles. Une version en boucle ouverte d'un modèle bayésien est proposée, où un réseau bayésien récurrent est connecté en cascade avec un modèle du système oculomoteur pour générer des réponses de poursuite lente. Les réponses du modèle reproduisent qualitativement les différentes dynamiques observées dans les réponses de poursuite à la barre inclinée en fonction des vitesses et des contrastes différents. La deuxième étude a enquêté sur les interactions dynamiques entre les signaux rétiniens et extra-rétiniens dans l'intégration dynamique de mouvement pour la poursuite lente par le moyen d'une suppression transitoire de la cible à différents moments de la poursuite, et notamment au cours de la phase de boucle ouverte et pendant l'état d'équilibre
This thesis addresses two studies by studying smooth pursuit eye movements for a translating tilted bar stimulus. First, the dynamic integration of local visual motion signals originating from retina and second, the influence of extra-retinal signals on motion integration. It also proposes a more generalized, hierarchical recurrent bayesian framework for smooth pursuit. The first study involved investigating dynamic motion integration for varying contrasts and speeds using a tilted bar stimuli. Results show that higher speeds and lower contrasts result in higher initial direction bias and subsequent dynamics of motion integration is slower for lower contrasts. It proposes an open-loop version of a recurrent bayesian model where a recurrent bayesian network is cascaded with an oculomotor plant to generate smooth pursuit responses. The model responses qualitatively account for the different dynamics observed in smooth pursuit responses to tilted bar stimulus at different speeds and contrasts. The second study investigated the dynamic interactions between retinal and extra-retinal signals in dynamic motion integration for smooth pursuit by transiently blanking the target at different moments during open-loop and steady-state phases of pursuit. The results suggest that weights to retinal and extra-retinal signals are dynamic in nature and extra-retinal signals dominate retinal signals on target reappearance after a blank introduced during open-loop of pursuit when compared to a blank introduced during steady-state of pursuit. The previous version of the model is updated to a closed-loop version and extended to a hierarchical recurrent bayesian model
APA, Harvard, Vancouver, ISO, and other styles
42

Mervin, Lewis. "Improved in silico methods for target deconvolution in phenotypic screens." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/283004.

Full text
Abstract:
Target-based screening projects for bioactive (orphan) compounds have been shown in many cases to be insufficiently predictive for in vivo efficacy, leading to attrition in clinical trials. Phenotypic screening has hence undergone a renaissance in both academia and in the pharmaceutical industry, partly due to this reason. One key shortcoming of this paradigm shift is that the protein targets modulated need to be elucidated subsequently, which is often a costly and time-consuming procedure. In this work, we have explored both improved methods and real-world case studies of how computational methods can help in target elucidation of phenotypic screens. One limitation of previous methods has been the ability to assess the applicability domain of the models, that is, when the assumptions made by a model are fulfilled and which input chemicals are reliably appropriate for the models. Hence, a major focus of this work was to explore methods for calibration of machine learning algorithms using Platt Scaling, Isotonic Regression Scaling and Venn-Abers Predictors, since the probabilities from well calibrated classifiers can be interpreted at a confidence level and predictions specified at an acceptable error rate. Additionally, many current protocols only offer probabilities for affinity, thus another key area for development was to expand the target prediction models with functional prediction (activation or inhibition). This extra level of annotation is important since the activation or inhibition of a target may positively or negatively impact the phenotypic response in a biological system. Furthermore, many existing methods do not utilize the wealth of bioactivity information held for orthologue species. We therefore also focused on an in-depth analysis of orthologue bioactivity data and its relevance and applicability towards expanding compound and target bioactivity space for predictive studies. The realized protocol was trained with 13,918,879 compound-target pairs and comprises 1,651 targets, which has been made available for public use at GitHub. Consequently, the methodology was applied to aid with the target deconvolution of AstraZeneca phenotypic readouts, in particular for the rationalization of cytotoxicity and cytostaticity in the High-Throughput Screening (HTS) collection. Results from this work highlighted which targets are frequently linked to the cytotoxicity and cytostaticity of chemical structures, and provided insight into which compounds to select or remove from the collection for future screening projects. Overall, this project has furthered the field of in silico target deconvolution, by improving the performance and applicability of current protocols and by rationalizing cytotoxicity, which has been shown to influence attrition in clinical trials.
APA, Harvard, Vancouver, ISO, and other styles
43

Haris, Daniel. "Optimalizace strojového učení pro predikci KPI." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385922.

Full text
Abstract:
This thesis aims to optimize the machine learning algorithms for predicting KPI metrics for an organization. The organization is predicting whether projects meet planned deadlines of the last phase of development process using machine learning. The work focuses on the analysis of prediction models and sets the goal of selecting new candidate models for the prediction system. We have implemented a system that automatically selects the best feature variables for learning. Trained models were evaluated by several performance metrics and the best candidates were chosen for the prediction. Candidate models achieved higher accuracy, which means, that the prediction system provides more reliable responses. We suggested other improvements that could increase the accuracy of the forecast.
APA, Harvard, Vancouver, ISO, and other styles
44

Leang, Isabelle. "Fusion en ligne d'algorithmes de suivi visuel d'objet." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066486/document.

Full text
Abstract:
Le suivi visuel d’objet est une fonction élémentaire de la vision par ordinateur ayant fait l’objet de nombreux travaux. La dérive au cours du temps est l'un des phénomènes les plus critiques à maîtriser, car elle aboutit à la perte définitive de la cible suivie. Malgré les nombreuses approches proposées dans la littérature pour contrer ce phénomène, aucune ne surpasse une autre en terme de robustesse face aux diverses sources de perturbations visuelles : variation d'illumination, occultation, mouvement brusque de caméra, changement d'aspect. L’objectif de cette thèse est d’exploiter la complémentarité d’un ensemble d'algorithmes de suivi, « trackers », en développant des stratégies de fusion en ligne capables de les combiner génériquement. La chaîne de fusion proposée a consisté à sélectionner les trackers à partir d'indicateurs de bon fonctionnement, à combiner leurs sorties et à les corriger. La prédiction en ligne de dérive a été étudiée comme un élément clé du mécanisme de sélection. Plusieurs méthodes sont proposées pour chacune des étapes de la chaîne, donnant lieu à 46 configurations de fusion possibles. Évaluées sur 3 bases de données, l’étude a mis en évidence plusieurs résultats principaux : une sélection performante améliore considérablement la robustesse de suivi ; une correction de mise à jour est préférable à une réinitialisation ; il est plus avantageux de combiner un petit nombre de trackers complémentaires et de performances homogènes qu'un grand nombre ; la robustesse de fusion d’un petit nombre de trackers est corrélée à la mesure d’incomplétude, ce qui permet de sélectionner la combinaison de trackers adaptée à un contexte applicatif donné
Visual object tracking is an elementary function of computer vision that has been the subject of numerous studies. Drift over time is one of the most critical phenomena to master because it leads to the permanent loss of the target being tracked. Despite the numerous approaches proposed in the literature to counter this phenomenon, none outperforms another in terms of robustness to the various sources of visual perturbations: variation of illumination, occlusion, sudden movement of camera, change of aspect. The objective of this thesis is to exploit the complementarity of a set of tracking algorithms by developing on-line fusion strategies capable of combining them generically. The proposed fusion chain consists of selecting the trackers from indicators of good functioning, combining their outputs and correcting them. On-line drift prediction was studied as a key element of the selection mechanism. Several methods are proposed for each step of the chain, giving rise to 46 possible fusion configurations. Evaluated on 3 databases, the study highlighted several key findings: effective selection greatly improves robustness; The correction improves the robustness but is sensitive to bad selection, making updating preferable to reinitialization; It is more advantageous to combine a small number of complementary trackers with homogeneous performances than a large number; The robustness of fusion of a small number of trackers is correlated to the incompleteness measure, which makes it possible to select the appropriate combination of trackers to a given application context
APA, Harvard, Vancouver, ISO, and other styles
45

Xu, Jiaofen. "Bagging E-Bayes for Estimated Breeding Value Prediction." Master's thesis, 2009. http://hdl.handle.net/10048/626.

Full text
Abstract:
This work focuses on the evaluation of a bagging EB method in terms of its ability to select a subset of QTL-related markers for accurate EBV prediction. Experiments were performed on several simulated and real datasets consisting of SNP genotypes and phenotypes. The simulated datasets modeled different dominance levels and different levels of background noises. Our results show that the bagging EB method is able to detect most of the simulated QTL, even with large background noises. The average recall of QTL detection was $0.71$. When using the markers detected by the bagging EB method to predict EBVs, the prediction accuracy improved dramatically on the simulation datasets compared to using the entire set of markers. However, the prediction accuracy did not improve much when doing the same experiments on the two real datasets. The best accuracy of EBV prediction we achieved for the dairy dataset is 0.57 and the best accuracy for the beef dataset is 0.73.
APA, Harvard, Vancouver, ISO, and other styles
46

Wang, Min-Hsueh, and 王敏學. "Using Bayes Theorem to Establish Fall Risk Prediction System-A Case of Older People." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/ax82sc.

Full text
Abstract:
碩士
慈濟大學
醫學資訊學系碩士班
103
Due to the ageing of Taiwan's population, the proportion of the elderly population in the community become more and more, and falls are the most easily accidents in the elderly. There are nearly fifty percent of people injured in the falls accident, so it is very important to use early prevention to reduce damage caused by falls. The current studies are mostly based on falls detection and notification, but it ignores the concept of “prevention is better than cure.” In this study, we collect three-axis acceleration values into fall criteria, combine with measurement values of physiology and history of disease, and calculate the sensitivity and specificity, as a basis for predicting the risk of falls. In the development and operation of the platform, we use quite popular software “Excel.” The purpose is to help people and health care providers easily to use and maintain the system. In this study, we expect the system can practically be used to predict the falls risk after confirm the accuracy of system by public test and achieve a certain degree of accuracy in falls risk by Bayes theorem.
APA, Harvard, Vancouver, ISO, and other styles
47

Farrell, John J. "The prediction of HLA genotypes from next generation sequencing and genome scan data." Thesis, 2014. https://hdl.handle.net/2144/14694.

Full text
Abstract:
Genome-wide association studies have very successfully found highly significant disease associations with single nucleotide polymorphisms (SNP) in the Major Histocompatibility Complex for adverse drug reactions, autoimmune diseases and infectious diseases. However, the extensive linkage disequilibrium in the region has made it difficult to unravel the HLA alleles underlying these diseases. Here I present two methods to comprehensively predict 4-digit HLA types from the two types of experimental genome data widely available. The Virtual SNP Imputation approach was developed for genome scan data and demonstrated a high precision and recall (96% and 97% respectively) for the prediction of HLA genotypes. A reanalysis of 6 genome-wide association studies using the HLA imputation method identified 18 significant HLA allele associations for 6 autoimmune diseases: 2 in ankylosing spondylitis, 2 in autoimmune thyroid disease, 2 in Crohn's disease, 3 in multiple sclerosis, 2 in psoriasis and 7 in rheumatoid arthritis. The EPIGEN consortium also used the Virtual SNP Imputation approach to detect a novel association of HLA-A*31:01 with adverse reactions to carbamazepine. For the prediction of HLA genotypes from next generation sequencing data, I developed a novel approach using a naïve Bayes algorithm called HLA-Genotyper. The validation results covered whole genome, whole exome and RNA-Seq experimental designs in the European and Yoruba population samples available from the 1000 Genomes Project. The RNA-Seq data gave the best results with an overall precision and recall near 0.99 for Europeans and 0.98 for the Yoruba population. I then successfully used the method on targeted sequencing data to detect significant associations of idiopathic membranous nephropathy with HLA-DRB1*03:01 and HLA-DQA1*05:01 using the 1000 Genomes European subjects as controls. Using the results reported here, researchers may now readily unravel the association of HLA alleles with many diseases from genome scans and next generation sequencing experiments without the expensive and laborious HLA typing of thousands of subjects. Both algorithms enable the analysis of diverse populations to help researchers pinpoint HLA loci with biological roles in infection, inflammation, autoimmunity, aging, mental illness and adverse drug reactions.
APA, Harvard, Vancouver, ISO, and other styles
48

Askari, Hemmat Reyhane. "SLA violation prediction : a machine learning perspective." Thèse, 2016. http://hdl.handle.net/1866/18754.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Kodzaga, Ermin. "Learning controller for prediction of lane change times : A study of driving behaviour using naive Bayes and Artificial Neural Networks." Thesis, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-322983.

Full text
Abstract:
Today's trucks are becoming more and more safe due to the use of an Advanced Driver Assistance System (ADAS). This system is aimed to assist the driver in the driving process, and to increase the safety for both the driver and the environment around the vehicle. These systems require strict design criteria to enable sufficiently high precision and robustness. ADAS are developing intensely today, and these systems represent a way towards a completely autonomous vehicle community. The main focus of this master thesis project is to investigate the possibility of predicting a driver's typical lane change time before the truck reaches a highway. This was done by trying to identify the driving behaviour using sensor data from non-highway driving. Techniques from machine learning, such as naive Bayes and Artificial Neural Networks (ANN), with various combinations of sensor inputs were used during this process. The results indicate that the assumption that different driving behaviours are representing different lane change times is true. Furthermore, predicting lane change times in whole seconds was as difficult as predicting lane change of three classes, fast, medium and slow. Predicting fast or slow lane change gave a better result. Only one set of validation data of totally five was predicted incorrectly. There was no big difference in the results between naive Bayes and the designed ANN. However, the results were not good enough for practical use, and more research is needed. Methods for increasing the performance and future work are also discussed.
APA, Harvard, Vancouver, ISO, and other styles
50

Prasetio, Murman Dwi, and Murman Dwi Prasetio. "AN INVESTIGATION OF RELATIONSHIP BETWEEN PREDICTION WORD AND SUBTASK CATEGORY IN TASK ANALYSIS – A NAIVE BAYES BASED MACHINE LEARNING APPROACH." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/70666453826288063516.

Full text
Abstract:
碩士
國立臺灣科技大學
工業管理系
100
Traditionally, indexing and searching of speech content in several tasks analysis have been achieved through a combination of separately construct natural language processing engines. Natural language processing is based on speech, the speech is primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. So, it is only logical that the next technological development to be natural language speech recognition for Human Computer Interaction (HCI).Unfortunately, in line with the development of computer system and its user interface a task analysis of users' current activities is not sufficient to guess what tasks the users will do following the previous tasks. In Lin and Lehto’s study, a Bayesian based semi-automated task analysis tool was developed to help task analysts predict categories of tasks/subtasks performed by knowledge agents from telephone conversations where agents were trying to help customers to troubleshoot their problems. The purpose of this study is to examine the dataset that was established by Lin and Lehto (2007) and further analyze the result of Bayesian based task analysis model proposed by Lin and Lehto (2009) by comparing the existing datasets result between two machine learning open source program based on Bayesian approach Text miner and Rapid miner which was invented by Hofmann, M and Klinkenberg, R, 2009. In this analysis, the Rapid Miner program generated a total of fifteen prediction results words in telephone’s dialog conversation between call center agent and customer. The fifteen combination words consist of single-words, pair-words, triple-words, quadruple words, single-pairs words, single-triple words, single-quadruple words, pair-triple words, pair-quadruple words, triple-quadruple words, single-pair-triple words, single-pair-quadruple words, single-triple-quadruple words, pair-triple-quadruple words, single-pair-triple-quadruple words. To identify the relationship between prediction words and main subtask categories, this study tries comparing machine learning open source program between Rapid Miner and Text Miner. These studies observe the results from rapid miner tool based on naive Bayesian having a poor performance than text miner to predict the relationship between prediction words and main subtask categories. Based on analysis Rapid Miner and Text Miner each has 71 subtask categories for 5184 narratives dialog datasets, the precision rate of rapid miner for all narratives datasets 33%, and testing set 26% and also for the training set 35% also the tool performance an average of correct prediction probability 19.91%. A total of 11 categories have correct prediction of over 50%. Out of these 11 Categories, 39 have correct predictions of below 50%. Compare to text miner tool’s results of main subtask categories in Fuzzy Bayesian task analysis consists of 13 categories have correct predictions of 80% or above and 34 categories have correct predictions of 50% or above. However, since Text miner under developing, a further analysis with the same datasets is needed to reconfirm the findings and compare the other tool based on text processing with the other algorithm or model development.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography