Дисертації з теми "Module clustering"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Module clustering".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Ptitsyn, Andrey. "New algorithms for EST clustering." Thesis, University of the Western Cape, 2000. http://etd.uwc.ac.za/index.php?module=etd&.
Повний текст джерелаPassmoor, Sean Stuart. "Clustering studies of radio-selected galaxies." Thesis, University of the Western Cape, 2011. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_7521_1332410859.
Повний текст джерелаWe investigate the clustering of HI-selected galaxies in the ALFALFA survey and compare results with those obtained for HIPASS. Measurements of the angular correlation function and the inferred 3D-clustering are compared with results from direct spatial-correlation measurements. We are able to measure clustering on smaller angular scales and for galaxies with lower HI masses than was previously possible. We calculate the expected clustering of dark matter using the redshift distributions of HIPASS and ALFALFA and show that the ALFALFA sample is somewhat more anti-biased with respect to dark matter than the HIPASS sample. We are able to conform the validity of the dark matter correlation predictions by performing simulations of the non-linear structure formation. Further we examine how the bias evolves with redshift for radio galaxies detected in the the first survey.
Javar, Shima. "Measurement and comparison of clustering algorithms." Thesis, Växjö University, School of Mathematics and Systems Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-1735.
Повний текст джерелаIn this project, a number of different clustering algorithms are described and their workings explained. They are compared to each other by implementing them on number of graphs with a known architecture.
These clustering algorithm, in the order they are implemented, are as follows: Nearest neighbour hillclimbing, Nearest neighbour big step hillclimbing, Best neighbour hillclimbing, Best neighbour big step hillclimbing, Gem 3D, K-means simple, K-means Gem 3D, One cluster and One cluster per node.
The graphs are Unconnected, Directed KX, Directed Cycle KX and Directed Cycle.
The results of these clusterings are compared with each other according to three criteria: Time, Quality and Extremity of nodes distribution. This enables us to find out which algorithm is most suitable for which graph. These artificial graphs are then compared with the reference architecture graph to reach the conclusions.
Hu, Yang. "PV Module Performance Under Real-world Test Conditions - A Data Analytics Approach." Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1396615109.
Повний текст джерелаRiedl, Pavel. "Modul shlukové analýzy systému pro dolování z dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237095.
Повний текст джерелаHandfield, Louis-François. "Cis-regulatory modules clustering from sequence similarity." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=112632.
Повний текст джерелаWu, Jingwen. "Model-based clustering and model selection for binned data." Thesis, Supélec, 2014. http://www.theses.fr/2014SUPL0005/document.
Повний текст джерелаThis thesis studies the Gaussian mixture model-based clustering approaches and the criteria of model selection for binned data clustering. Fourteen binned-EM algorithms and fourteen bin-EM-CEM algorithms are developed for fourteen parsimonious Gaussian mixture models. These new algorithms combine the advantages in computation time reduction of binning data and the advantages in parameters estimation simplification of parsimonious Gaussian mixture models. The complexities of the binned-EM and the bin-EM-CEM algorithms are calculated and compared to the complexities of the EM and the CEM algorithms respectively. In order to select the right model which fits well the data and satisfies the clustering precision requirements with a reasonable computation time, AIC, BIC, ICL, NEC, and AWE criteria, are extended to binned data clustering when the proposed binned-EM and bin-EM-CEM algorithms are used. The advantages of the different proposed methods are illustrated through experimental studies
Sampson, Joshua Neil. "Clustering genes in genetical genomics /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/9549.
Повний текст джерелаYelibi, Lionel. "Introduction to fast Super-Paramagnetic Clustering." Master's thesis, Faculty of Science, 2019. http://hdl.handle.net/11427/31332.
Повний текст джерелаMair, Patrick, and Marcus Hudec. "Session Clustering Using Mixtures of Proportional Hazards Models." Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2008. http://epub.wu.ac.at/598/1/document.pdf.
Повний текст джерелаSeries: Research Report Series / Department of Statistics and Mathematics
Lu, Zhengdong. "Constrained clustering and cognitive decline detection /." Full text open access at:, 2008. http://content.ohsu.edu/u?/etd,650.
Повний текст джерелаMadsen, Christopher. "Clustering of the Stockholm County housing market." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252301.
Повний текст джерелаI denna uppsats har en klustring av Stockholms läns bostadsmarknad genomförts med olika klustringsmetoder. Data har bearbetats och olika geografiska begränsningar har använts. DeSO (Demografiska Statistiska Områden), som utvecklats av SCB, har använts för att dela in bostadsmarknaden i mindre regioner för vilka områdesattribut har beräknats. Hierarkiska klustringsmetoder, SKATER och Gaussian mixture models har tillämpats. Metoder som använder olika typer av geografiska begränsningar har också tillämpats i ett försök att skapa mer geografiskt sammanhängande kluster. De olika metoderna jämförs sedan med avseende på kvalitet och stabilitet. Den bästa metoden, med avseende på kvalitet, är en Gaussian mixture model kallad EII, även känd som K-means. Den mest stabila metoden är ClustGeo-metoden.
Yan, Guohua. "Linear clustering with application to single nucleotide polymorphism genotyping." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/958.
Повний текст джерелаCorneli, Marco. "Dynamic stochastic block models, clustering and segmentation in dynamic graphs." Thesis, Paris 1, 2017. http://www.theses.fr/2017PA01E012/document.
Повний текст джерелаThis thesis focuses on the statistical analysis of dynamic graphs, both defined in discrete or continuous time. We introduce a new extension of the stochastic block model (SBM) for dynamic graphs. The proposed approach, called dSBM, adopts non homogeneous Poisson processes to model the interaction times between pairs of nodes in dynamic graphs, either in discrete or continuous time. The intensity functions of the processes only depend on the node clusters, in a block modelling perspective. Moreover, all the intensity functions share some regularity properties on hidden time intervals that need to be estimated. A recent estimation algorithm for SBM, based on the greedy maximization of an exact criterion (exact ICL) is adopted for inference and model selection in dSBM. Moreover, an exact algorithm for change point detection in time series, the "pruned exact linear time" (PELT) method is extended to deal with dynamic graph data modelled via dSBM. The approach we propose can be used for change point analysis in graph data. Finally, a further extension of dSBM is developed to analyse dynamic net- works with textual edges (like social networks, for instance). In this context, the graph edges are associated with documents exchanged between the corresponding vertices. The textual content of the documents can provide additional information about the dynamic graph topological structure. The new model we propose is called "dynamic stochastic topic block model" (dSTBM).Graphs are mathematical structures very suitable to model interactions between objects or actors of interest. Several real networks such as communication networks, financial transaction networks, mobile telephone networks and social networks (Facebook, Linkedin, etc.) can be modelled via graphs. When observing a network, the time variable comes into play in two different ways: we can study the time dates at which the interactions occur and/or the interaction time spans. This thesis only focuses on the first time dimension and each interaction is assumed to be instantaneous, for simplicity. Hence, the network evolution is given by the interaction time dates only. In this framework, graphs can be used in two different ways to model networks. Discrete time […] Continuous time […]. In this thesis both these perspectives are adopted, alternatively. We consider new unsupervised methods to cluster the vertices of a graph into groups of homogeneous connection profiles. In this manuscript, the node groups are assumed to be time invariant to avoid possible identifiability issues. Moreover, the approaches that we propose aim to detect structural changes in the way the node clusters interact with each other. The building block of this thesis is the stochastic block model (SBM), a probabilistic approach initially used in social sciences. The standard SBM assumes that the nodes of a graph belong to hidden (disjoint) clusters and that the probability of observing an edge between two nodes only depends on their clusters. Since no further assumption is made on the connection probabilities, SBM is a very flexible model able to detect different network topologies (hubs, stars, communities, etc.)
Mak, Brian Kan-Wing. "Towards a compact speech recognizer : subspace distribution clustering hidden Markov model /." Full text open access at:, 1998. http://content.ohsu.edu/u?/etd,215.
Повний текст джерелаHajj, Hussein Rami. "Étude Raman des alliages (Ge,Si), (Zn,Be)Se et Zn(Se,S) via le modèle de percolation : agrégation vs. dispersion et phonon-polaritons." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0103/document.
Повний текст джерелаThe ins and outs of the phenomenological percolation model (multi-mode per bond) developed by the team for the basic understanding of the Raman and infrared spectra of semiconductor alloys with zincblende (II-VI & III-V) and diamond (IV-IV) structure are further explored in novel areas with the Ge1-xSix (diamant), Zn1-xBexSe (zincblende) and ZnSe1-xSx (zincblende) alloys. The version of the percolation worked out for the GeSi diamond alloy (3 bonds, 6 modes/phonons), more refined than the current one for zincblende alloys (2 bonds, 3 phonons), is used as a model version to formalize, via the introduction of a relevant order parameter k, an intrinsic ability behind the vibration spectra, to ‘measure’ the nature of the alloy disorder, as to whether this reflects a random substitution, or a trend towards local clustering or local anticlustering. The percolation-type Zn0.67Be0.33Se alloy is used as a model system to study, by using an unconventional Raman setup corresponding to forward scattering, the dispersion of the transverse optic phonons on approaching of tau, the centre of the Brillouin zone. At this limit such modes become equipped with a macroscopic electric field similar in every point to that carried by a pure electromagnetic wave, namely a photon, being then identified as phonon-polaritons. A specificity of the alloy-related phonon-polaritons, namely their reinforcement approaching of tau ,unexplored so far, is further investigated experimentally with the Zn0.47Be0.53Se et ZnSe0.68S0.32 alloys, selected on purpose, and was indeed confirmed in the latter alloy. A recent infrared study of ZnSeS in the literature has revealed a disconcerting multi-phonon pattern for its shorter bond species (Zn-S). We show that such pattern can be explained within a generalized version of the percolation scheme, a more sophisticated one than the standard version, taking into account the effect of the phonon dispersion in addition to the effect of the local strain. Besides, a refined study of the phonon-polariton regime related to the long Zn-Se bond reveals an unsuspected bimodal pattern, which echoes that earlier evidenced for the short (Zn-S) species. This establishes on an experimental basis that the percolation scheme (multi-phonon per bond) is generic and applies as well to any bond species in an alloy, in principle. Last, we explore the behavior of the Zn-S doublet of ZnSeS at the approach of the zincblende->rocksalt (~14 GPa) transition, by near-forward Raman scattering under pressure, i.e. in the phonon-polariton regime. The low-frequency Zn-S mode appears to weakens and converges onto the high-frequency Zn-S mode under pressure, as earlier observed for the Be-Se doublet of ZnBeSe in backscattering. Such behavior seems to be intrinsic to the percolation-type doublet for the considered structural phase transition. This would reflect a sensitivity to the local instabilities of the host bonds (Zn-Se) at the approach of their natural structure phase transitions characteristic of the related pure compound (ZnSe). The above mentioned behaviors are discussed on the basis of a detailed contour modeling of the Raman spectra taken in backscattering (usual geometry) and forward scattering (depending on the scattering angle then) within the scope of the linear dielectric response. The assignment of the Raman modes is achieved via ab initio phonon calculations done within the SIESTA code using prototype impurity motifs. The predictions of the percolation scheme concerning the k-dependence of the GeSi Raman spectra are confronted with direct ab initio calculations of the GeSi Raman spectra done in collaboration (with V.J.B. Torres) using the AIMPRO code on supercells covering a selection of representative k values
Louw, Jan Paul. "Evidence of volatility clustering on the FTSE/JSE top 40 index." Thesis, Stellenbosch : Stellenbosch University, 2008. http://hdl.handle.net/10019.1/5039.
Повний текст джерелаENGLISH ABSTRACT: This research report investigated whether evidence of volatility clustering exists on the FTSE/JSE Top 40 Index. The presence of volatility clustering has practical implications relating to market decisions as well as the accurate measurement and reliable forecasting of volatility. This research report was conducted as an in-depth analysis of volatility, measured over five different return interval sizes covering the sample in non-overlapping periods. Each of the return interval sizes' volatility were analysed to reveal the distributional characteristics and if it violated the normality assumption. The volatility was also analysed to identify in which way, if any, subsequent periods are correlated. For each of the interval sizes one-step-ahead volatility forecasting was conducted using Linear Regression, Exponential Smoothing, GARCH(1,1) and EGARCH(1,1) models. The results were analysed using appropriate criteria to determine which of the forecasting models were more powerful. The forecasting models range from very simple to very complex, the rationale for this was to determine if more complex models outperform simpler models. The analysis showed that there was sufficient evidence to conclude that there was volatility clustering on the FTSE/JSE Top 40 Index. It further showed that more complex models such as the GARCH(1,1) and EGARCH(1,1) only marginally outperformed less complex models, and does not offer any real benefit over simpler models such as Linear Regression. This can be ascribed to the mean reversion effect of volatility and gives further insight into the volatility structure over the sample period.
AFRIKAANSE OPSOMMING: Die navorsingsverslag ondersoek die FTSE/JSE Top 40 Indeks om te bepaal of daar genoegsame bewyse is dat volatiliteitsbondeling teenwoordig is. Die teenwoordigheid van volatiliteitsbondeling het praktiese implikasies vir besluite in finansiele markte en akkurate en betroubare volatiliteitsvooruitskattings. Die verslag doen 'n diepgaande ontleding van volatiliteit, gemeet oor vyf verskillende opbrengs interval groottes wat die die steekproef dek in nie-oorvleuelende periodes. Elk van die opbrengs interval groottes se volatiliteitsverdelings word ontleed om te bepaal of dit verskil van die normaalverdeling. Die volatiliteit van die intervalle word ook ondersoek om te bepaal tot watter mate, indien enige, opeenvolgende waarnemings gekorreleer is. Vir elk van die interval groottes word 'n een-stap-vooruit vooruitskatting gedoen van volatiliteit. Dit word gedoen deur middel van Lineêre Regressie, Eksponensiële Gladstryking, GARCH(1,1) en die EGARCH(1,1) modelle. Die resultate word ontleed deur middel van erkende kriteria om te bepaal watter model die beste vooruitskattings lewer. Die modelle strek van baie eenvoudig tot baie kompleks, die rasionaal is om te bepaal of meer komplekse modelle beter resultate lewer as eenvoudiger modelle. Die ontleding toon dat daar genoegsame bewyse is om tot die gevolgtrekking te kom dat daar volatiliteitsbondeling is op die FTSE/JSE Top 40 Indeks. Dit toon verder dat meer komplekse vooruitskattingsmodelle soos die GARCH(1,1) en die EGARCH(1,1) slegs marginaal beter presteer het as die eenvoudiger vooruitskattingsmodelle en nie enige werklike voordeel soos Lineêre Regressie bied nie. Dit kan toegeskryf word aan die neiging van volatiliteit am terug te keer tot die gemiddelde, wat verdere insig lewer oor volatiliteit gedurende die steekproef.
Masmoudi, Nesrine. "Modèle bio-inspiré pour le clustering de graphes : application à la fouille de données et à la distribution de simulations." Thesis, Normandie, 2017. http://www.theses.fr/2017NORMLH26/document.
Повний текст джерелаIn this work, we present a novel method based on behavior of real ants for solving unsupervised non-hierarchical classification problem. This approach dynamically creates data groups. It is based on the concept of artificial ants moving complexly at the same time with simple location rules. Each ant represents a data in the algorithm. The movements of ants aim to create homogenous data groups that evolve together in a graph structure. We also propose a method of incremental building neighborhood graphs by artificial ants. We propose two approaches that are derived among biomimetic algorithms, they are hybrid in the sense that the search for the number of classes starting, which are performed by the classical algorithm K-Means classification, it is used to initialize the first partition and the graph structure
Revillon, Guillaume. "Uncertainty in radar emitter classification and clustering." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS098/document.
Повний текст джерелаIn Electronic Warfare, radar signals identification is a supreme asset for decision making in military tactical situations. By providing information about the presence of threats, classification and clustering of radar signals have a significant role ensuring that countermeasures against enemies are well-chosen and enabling detection of unknown radar signals to update databases. Most of the time, Electronic Support Measures systems receive mixtures of signals from different radar emitters in the electromagnetic environment. Hence a radar signal, described by a pulse-to-pulse modulation pattern, is often partially observed due to missing measurements and measurement errors. The identification process relies on statistical analysis of basic measurable parameters of a radar signal which constitute both quantitative and qualitative data. Many general and practical approaches based on data fusion and machine learning have been developed and traditionally proceed to feature extraction, dimensionality reduction and classification or clustering. However, these algorithms cannot handle missing data and imputation methods are required to generate data to use them. Hence, the main objective of this work is to define a classification/clustering framework that handles both outliers and missing values for any types of data. Here, an approach based on mixture models is developed since mixture models provide a mathematically based, flexible and meaningful framework for the wide variety of classification and clustering requirements. The proposed approach focuses on the introduction of latent variables that give us the possibility to handle sensitivity of the model to outliers and to allow a less restrictive modelling of missing data. A Bayesian treatment is adopted for model learning, supervised classification and clustering and inference is processed through a variational Bayesian approximation since the joint posterior distribution of latent variables and parameters is untractable. Some numerical experiments on synthetic and real data show that the proposed method provides more accurate results than standard algorithms
Laclau, Charlotte. "Hard and fuzzy block clustering algorithms for high dimensional data." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB014.
Повний текст джерелаWith the increasing number of data available, unsupervised learning has become an important tool used to discover underlying patterns without the need to label instances manually. Among different approaches proposed to tackle this problem, clustering is arguably the most popular one. Clustering is usually based on the assumption that each group, also called cluster, is distributed around a center defined in terms of all features while in some real-world applications dealing with high-dimensional data, this assumption may be false. To this end, co-clustering algorithms were proposed to describe clusters by subsets of features that are the most relevant to them. The obtained latent structure of data is composed of blocks usually called co-clusters. In first two chapters, we describe two co-clustering methods that proceed by differentiating the relevance of features calculated with respect to their capability of revealing the latent structure of the data in both probabilistic and distance-based framework. The probabilistic approach uses the mixture model framework where the irrelevant features are assumed to have a different probability distribution that is independent of the co-clustering structure. On the other hand, the distance-based (also called metric-based) approach relied on the adaptive metric where each variable is assigned with its weight that defines its contribution in the resulting co-clustering. From the theoretical point of view, we show the global convergence of the proposed algorithms using Zangwill convergence theorem. In the last two chapters, we consider a special case of co-clustering where contrary to the original setting, each subset of instances is described by a unique subset of features resulting in a diagonal structure of the initial data matrix. Same as for the two first contributions, we consider both probabilistic and metric-based approaches. The main idea of the proposed contributions is to impose two different kinds of constraints: (1) we fix the number of row clusters to the number of column clusters; (2) we seek a structure of the original data matrix that has the maximum values on its diagonal (for instance for binary data, we look for diagonal blocks composed of ones with zeros outside the main diagonal). The proposed approaches enjoy the convergence guarantees derived from the results of the previous chapters. Finally, we present both hard and fuzzy versions of the proposed algorithms. We evaluate our contributions on a wide variety of synthetic and real-world benchmark binary and continuous data sets related to text mining applications and analyze advantages and inconvenients of each approach. To conclude, we believe that this thesis covers explicitly a vast majority of possible scenarios arising in hard and fuzzy co-clustering and can be seen as a generalization of some popular biclustering approaches
Zeng, Jingying. "Latent Factor Models for Recommender Systems and Market Segmentation Through Clustering." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1491255524283942.
Повний текст джерелаBartcus, Marius. "Bayesian non-parametric parsimonious mixtures for model-based clustering." Thesis, Toulon, 2015. http://www.theses.fr/2015TOUL0010/document.
Повний текст джерелаThis thesis focuses on statistical learning and multi-dimensional data analysis. It particularly focuses on unsupervised learning of generative models for model-based clustering. We study the Gaussians mixture models, in the context of maximum likelihood estimation via the EM algorithm, as well as in the Bayesian estimation context by maximum a posteriori via Markov Chain Monte Carlo (MCMC) sampling techniques. We mainly consider the parsimonious mixture models which are based on a spectral decomposition of the covariance matrix and provide a flexible framework particularly for the analysis of high-dimensional data. Then, we investigate non-parametric Bayesian mixtures which are based on general flexible processes such as the Dirichlet process and the Chinese Restaurant Process. This non-parametric model formulation is relevant for both learning the model, as well for dealing with the issue of model selection. We propose new Bayesian non-parametric parsimonious mixtures and derive a MCMC sampling technique where the mixture model and the number of mixture components are simultaneously learned from the data. The selection of the model structure is performed by using Bayes Factors. These models, by their non-parametric and sparse formulation, are useful for the analysis of large data sets when the number of classes is undetermined and increases with the data, and when the dimension is high. The models are validated on simulated data and standard real data sets. Then, they are applied to a real difficult problem of automatic structuring of complex bioacoustic data issued from whale song signals. Finally, we open Markovian perspectives via hierarchical Dirichlet processes hidden Markov models
Claudia, da Rocha Rego Monteiro Carla. "Bi-clustering de Dados Genéticos Binários Baseado em Modelos de Classificação Logística." Universidade Federal de Pernambuco, 2009. https://repositorio.ufpe.br/handle/123456789/6991.
Повний текст джерелаInformações de interações de proteínas são fundamentais para a compreensão dos processos celulares. Por esta razão, várias abordagens têm sido propostas para inferir sobre pares de proteínas de redes de todos os tipos de dados biológicos. Nesta tese é proposto um método de bi-clustering, Lbic, baseado num modelo de classificação logística, para analisar dados biológicos binários. O Lbic é comparado com outros dois métodos de bi-clustering apresentados na literatura, mostrando melhores resultados. Seu desempenho também é comparado àqueles de um método supervisionado, análise de correlação canônica com Kernel, aplicado aos mesmos conjuntos de dados. Os resultados mostram que o Lbic alcança desempenho superior aos da aborgadem supervisionada treinada com até 25% do conhecimento da rede alvo
Bechchi, Mounir. "Clustering-based Approximate Answering of Query Result in Large and Distributed Databases." Phd thesis, Université de Nantes, 2009. http://tel.archives-ouvertes.fr/tel-00475917.
Повний текст джерелаTalavera, Edwin Rafael Villanueva. "Métodos Bayesianos aplicados em taxonomia molecular." Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/18/18152/tde-03102007-105125/.
Повний текст джерелаIn this work are presented two clustering methods thought to be applied in molecular taxonomy. These methods are based in probabilistic models which overcome some problems observed in traditional clustering methods such as the difficulty to know which distance metric must be used or the lack of treatment of available prior information. The proposed methods use the Bayes theorem to combine the information of the data with the available prior information, reason why they are called Bayesian methods. The first method implemented in this work was the hierarchical Bayesian clustering, which is an agglomerative hierarchical method that constructs a hierarchy of partitions (dendogram) guided by the criterion of maximum Bayesian posterior probability of the partition. The second method is based in a type of probabilistic graphical model knows as conditional Gaussian network, which was adapted for data clustering. Both methods were validated in 3 datasets where the labels are known. The methods were used too in a real problem: the clustering of a brazilian collection of bacterial strains belonging to the genus Bradyrhizobium, known by their capacity to transform the nitrogen (\'N IND.2\') of the atmosphere into nitrogen compounds useful for the host plants. This dataset is formed by genetic data resulting of the analysis of the ribosomal RNA. The results shown that the hierarchical Bayesian clustering method built dendrograms with good quality, in some cases, better than the other hierarchical methods. In the method based in conditional Gaussian network was observed acceptable results, showing an adequate utilization of the prior information (about the clusters) to determine the optimal number of clusters and to improve the quality of the groups.
Freudenberg, Johannes M. "Bayesian Infinite Mixture Models for Gene Clustering and Simultaneous Context Selection Using High-Throughput Gene Expression Data." University of Cincinnati / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1258660232.
Повний текст джерелаKC, Rabi. "Study of Some Biologically Relevant Dynamical System Models: (In)stability Regions of Cyclic Solutions in Cell Cycle Population Structure Model Under Negative Feedback and Random Connectivities in Multitype Neuronal Network Models." Ohio University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou16049254273607.
Повний текст джерелаFaria, Rodrigo Augusto Dias. "Human skin segmentation using correlation rules on dynamic color clustering." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-01102018-101814/.
Повний текст джерелаA pele humana é constituída de uma série de camadas distintas, cada uma das quais reflete uma porção de luz incidente, depois de absorver uma certa quantidade dela pelos pigmentos que se encontram na camada. Os principais pigmentos responsáveis pela origem da cor da pele são a melanina e a hemoglobina. A segmentação de pele desempenha um papel importante em uma ampla gama de aplicações em processamento de imagens e visão computacional. Em suma, existem três abordagens principais para segmentação de pele: baseadas em regras, aprendizado de máquina e híbridos. Elas diferem em termos de precisão e eficiência computacional. Geralmente, as abordagens com aprendizado de máquina e as híbridas superam os métodos baseados em regras, mas exigem um conjunto de dados de treinamento grande e representativo e, por vezes, também um tempo de classificação custoso, que pode ser um fator decisivo para aplicações em tempo real. Neste trabalho, propomos uma melhoria, em três versões distintas, de um novo método de segmentação de pele baseado em regras que funciona no espaço de cores YCbCr. Nossa motivação baseia-se nas hipóteses de que: (1) a regra original pode ser complementada e, (2) pixels de pele humana não aparecem isolados, ou seja, as operações de vizinhança são levadas em consideração. O método é uma combinação de algumas regras de correlação baseadas nessas hipóteses. Essas regras avaliam as combinações de valores de crominância Cb, Cr para identificar os pixels de pele, dependendo da forma e tamanho dos agrupamentos de cores de pele gerados dinamicamente. O método é muito eficiente em termos de esforço computacional, bem como robusto em imagens muito complexas.
Steinberg, Daniel. "An Unsupervised Approach to Modelling Visual Data." Thesis, The University of Sydney, 2013. http://hdl.handle.net/2123/9415.
Повний текст джерелаBorke, Lukas. "Dynamic Clustering and Visualization of Smart Data via D3-3D-LSA." Doctoral thesis, Humboldt-Universität zu Berlin, 2017. http://dx.doi.org/10.18452/18307.
Повний текст джерелаWith the growing popularity of GitHub, the largest host of source code and collaboration platform in the world, it has evolved to a Big Data resource offering a variety of Open Source repositories (OSR). At present, there are more than one million organizations on GitHub, among them Google, Facebook, Twitter, Yahoo, CRAN, RStudio, D3, Plotly and many more. GitHub provides an extensive REST API, which enables scientists to retrieve valuable information about the software and research development life cycles. Our research pursues two main objectives: (I) provide an automatic OSR categorization system for data science teams and software developers promoting discoverability, technology transfer and coexistence; (II) establish visual data exploration and topic driven navigation of GitHub organizations for collaborative reproducible research and web deployment. To transform Big Data into value, in other words into Smart Data, storing and processing of the data semantics and metadata is essential. Further, the choice of an adequate text mining (TM) model is important. The dynamic calibration of metadata configurations, TM models (VSM, GVSM, LSA), clustering methods and clustering quality indices will be shortened as "smart clusterization". Data-Driven Documents (D3) and Three.js (3D) are JavaScript libraries for producing dynamic, interactive data visualizations, featuring hardware acceleration for rendering complex 2D or 3D computer animations of large data sets. Both techniques enable visual data mining (VDM) in web browsers, and will be abbreviated as D3-3D. Latent Semantic Analysis (LSA) measures semantic information through co-occurrence analysis in the text corpus. Its properties and applicability for Big Data analytics will be demonstrated. "Smart clusterization" combined with the dynamic VDM capabilities of D3-3D will be summarized under the term "Dynamic Clustering and Visualization of Smart Data via D3-3D-LSA".
Hlosta, Martin. "Modul pro shlukovou analýzu systému pro dolování z dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237158.
Повний текст джерелаPešout, Pavel. "Přístupy k shlukování funkčních dat." Doctoral thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-77066.
Повний текст джерелаKorger, Christina. "Clustering of Distributed Word Representations and its Applicability for Enterprise Search." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-208869.
Повний текст джерелаSandoval, Arenas Santiago. "Revisiting stormwater quality conceptual models in a large urban catchment : Online measurements, uncertainties in data and models." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEI089/document.
Повний текст джерелаTotal Suspended Solids (TSS) stormwater models in urban drainage systems are often required for scientific, legal, environmental and operational reasons. However, these TSS stormwater traditional model structures have been widely questioned, especially when reproducing data from online measurements at the outlet of large urban catchments. In this thesis, three potential limitations of traditional TSS stormwater models are analyzed in a 185 ha urban catchment (Chassieu, Lyon, France), by means 365 rainfall events monitored online: a) uncertainties in TSS data due to field conditions; b) uncertainties in hydrological models and rainfall measurements and c) uncertainties in the stormwater quality model structures. These aspects are investigated in six separate contributions, whose principal results can be summarized as follows: a) TSS data acquisition and validation: (i) four sampling strategies during rainfall events are simulated and evaluated by online TSS and flow rate measurements. Recommended sampling time intervals are of 5 min, with average sampling errors between 7 % and 20 % and uncertainties in sampling errors of about 5 %, depending on the sampling interval; (ii) the probability of underestimating the cross section mean TSS concentration is estimated by two methodologies. One method shows more realistic TSS underestimations (about 39 %) than the other (about 269 %). b) Hydrological models and rainfall measurements: (iii) a parameter estimation strategy is proposed for conceptual rainfall-runoff model by analyzing the variability of the optimal parameters obtained by single-event Bayesian calibrations, based on clusters and graphs representations. The new strategy shows more performant results in terms of accuracy and precision in validation; (iv) a methodology aimed to calculate “mean” areal rainfall estimation is proposed, based on the same hydrological model and flow rate data. Rainfall estimations by multiplying factors over constant-length time window and rainfall zero records filled with a reverse model show the most satisfactory results compared to further rainfall estimation models. c) Stormwater TSS pollutograph modelling: (v) the modelling performance of the traditional Rating Curve (RC) model is superior to different linear Transfer Function models (TFs), especially in terms of parsimony and precision of the simulations. No relation between the rainfall corrections or hydrological conditions defined in (iii) and (iv) with performances of RC and TFs could be established. Statistical tests strengthen that the occurrence of events not representable by the RC model in time is independent of antecedent dry weather conditions; (vi) a Bayesian reconstruction method of virtual state variables indicate that potential missing processes in the RC description are hardly interpretable as a unique state of virtual available mass over the catchment decreasing over time, as assumed by a great number of traditional models
D'ANGELO, LAURA. "Bayesian modeling of calcium imaging data." Doctoral thesis, Università degli Studi di Padova, 2022. https://hdl.handle.net/10281/399067.
Повний текст джерелаPerronnet, Caroline. "Etude de thérapies génique et pharmacologique visant à restaurer les capacités cognitives d’un modèle murin de la Dystrophie musculaire de Duchenne." Thesis, Paris 11, 2011. http://www.theses.fr/2011PA112009.
Повний текст джерелаTherapies have been developed to treat Duchenne muscular dystrophy (DMD, due to mutation in the dystrophin gene), but their ability to restore the cognitive deficits associated with this syndrome has not been yet studied. We explored two therapeutic approaches to compensate for the brain alterations resulting from the loss of dystrophin in the mdx mouse, a model of DMD. A pharmacological approach based on the overexpression of utrophin, a dystrophin homologue, does not alleviate the behavioural deficits in these mice. In contrast, a genetic intervention based on the splicing of the mutated exon leads to the restoration of endogenous dystrophin and a recovery of brain alterations such as the clustering of GABAA receptors and hippocampal synaptic plasticity in mdx mice. These results suggest a role for dystrophin in adult brain plasticity and indicate that this gene therapy approach is applicable to the treatment of cognitive impairments in DMD
Gorin, Arseniy. "Structuration du modèle acoustique pour améliorer les performance de reconnaissance automatique de la parole." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0161/document.
Повний текст джерелаThis thesis focuses on acoustic model structuring for improving HMM-Based automatic speech recognition. The structuring relies on unsupervised clustering of speech utterances of the training data in order to handle speaker and channel variability. The idea is to split the data into acoustically similar classes. In conventional multi-Modeling (or class-Based) approach, separate class-Dependent models are built via adaptation of a speaker-Independent model. When the number of classes increases, less data becomes available for the estimation of the class-Based models, and the parameters are less reliable. One way to handle such problem is to modify the classification criterion applied on the training data, allowing a given utterance to belong to more than one class. This is obtained by relaxing the classification decision through a soft margin. This is investigated in the first part of the thesis. In the main part of the thesis, a novel approach is proposed that uses the clustered data more efficiently in a class-Structured GMM. Instead of adapting all HMM-GMM parameters separately for each class of data, the class information is explicitly introduced into the GMM structure by associating a given density component with a given class. To efficiently exploit such structured HMM-GMM, two different approaches are proposed. The first approach combines class-Structured GMM with class-Dependent mixture weights. In this model the Gaussian components are shared across speaker classes, but they are class-Structured, and the mixture weights are class-Dependent. For decoding an utterance, the set of mixture weights is selected according to the estimated class. In the second approach, the mixture weights are replaced by density component transition probabilities. The approaches proposed in the thesis are analyzed and evaluated on various speech data, which cover different types of variability sources (age, gender, accent and noise)
Hasnat, Md Abul. "Unsupervised 3D image clustering and extension to joint color and depth segmentation." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4013/document.
Повний текст джерелаAccess to the 3D images at a reasonable frame rate is widespread now, thanks to the recent advances in low cost depth sensors as well as the efficient methods to compute 3D from 2D images. As a consequence, it is highly demanding to enhance the capability of existing computer vision applications by incorporating 3D information. Indeed, it has been demonstrated in numerous researches that the accuracy of different tasks increases by including 3D information as an additional feature. However, for the task of indoor scene analysis and segmentation, it remains several important issues, such as: (a) how the 3D information itself can be exploited? and (b) what is the best way to fuse color and 3D in an unsupervised manner? In this thesis, we address these issues and propose novel unsupervised methods for 3D image clustering and joint color and depth image segmentation. To this aim, we consider image normals as the prominent feature from 3D image and cluster them with methods based on finite statistical mixture models. We consider Bregman Soft Clustering method to ensure computationally efficient clustering. Moreover, we exploit several probability distributions from directional statistics, such as the von Mises-Fisher distribution and the Watson distribution. By combining these, we propose novel Model Based Clustering methods. We empirically validate these methods using synthetic data and then demonstrate their application for 3D/depth image analysis. Afterward, we extend these methods to segment synchronized 3D and color image, also called RGB-D image. To this aim, first we propose a statistical image generation model for RGB-D image. Then, we propose novel RGB-D segmentation method using a joint color-spatial-axial clustering and a statistical planar region merging method. Results show that, the proposed method is comparable with the state of the art methods and requires less computation time. Moreover, it opens interesting perspectives to fuse color and geometry in an unsupervised manner. We believe that the methods proposed in this thesis are equally applicable and extendable for clustering different types of data, such as speech, gene expressions, etc. Moreover, they can be used for complex tasks, such as joint image-speech data analysis
Schmutz, Amandine. "Contributions à l'analyse de données fonctionnelles multivariées, application à l'étude de la locomotion du cheval de sport." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1241.
Повний текст джерелаWith the growth of smart devices market to provide athletes and trainers a systematic, objective and reliable follow-up, more and more parameters are monitored for a same individual. An alternative to laboratory evaluation methods is the use of inertial sensors which allow following the performance without hindering it, without space limits and without tedious initialization procedures. Data collected by those sensors can be classified as multivariate functional data: some quantitative entities evolving along time and collected simultaneously for a same individual. The aim of this thesis is to find parameters for analysing the athlete horse locomotion thanks to a sensor put in the saddle. This connected device (inertial sensor, IMU) for equestrian sports allows the collection of acceleration and angular velocity along time in the three space directions and with a sampling frequency of 100 Hz. The database used for model development is made of 3221 canter strides from 58 ridden jumping horses of different age and level of competition. Two different protocols are used to collect data: one for straight path and one for curved path. We restricted our work to the prediction of three parameters: the speed per stride, the stride length and the jump quality. To meet the first to objectives, we developed a multivariate functional clustering method that allow the division of the database into smaller more homogeneous sub-groups from the collected signals point of view. This method allows the characterization of each group by it average profile, which ease the data understanding and interpretation. But surprisingly, this clustering model did not improve the results of speed prediction, Support Vector Machine (SVM) is the model with the lowest percentage of error above 0.6 m/s. The same applied for the stride length where an accuracy of 20 cm is reached thanks to SVM model. Those results can be explained by the fact that our database is build from 58 horses only, which is a quite low number of individuals for a clustering method. Then we extend this method to the co-clustering of multivariate functional data in order to ease the datamining of horses’ follow-up databases. This method might allow the detection and prevention of locomotor disturbances, main source of interruption of jumping horses. Lastly, we looked for correlation between jumping quality and signals collected by the IMU. First results show that signals collected by the saddle alone are not sufficient to differentiate finely the jumping quality. Additional information will be needed, for example using complementary sensors or by expanding the database to have a more diverse range of horses and jump profiles
Ailem, Melissa. "Sparsity-sensitive diagonal co-clustering algorithms for the effective handling of text data." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB087.
Повний текст джерелаIn the current context, there is a clear need for Text Mining techniques to analyse the huge quantity of unstructured text documents available on the Internet. These textual data are often represented by sparse high dimensional matrices where rows and columns represent documents and terms respectively. Thus, it would be worthwhile to simultaneously group these terms and documents into meaningful clusters, making this substantial amount of data easier to handle and interpret. Co-clustering techniques just serve this purpose. Although many existing co-clustering approaches have been successful in revealing homogeneous blocks in several domains, these techniques are still challenged by the high dimensionality and sparsity characteristics exhibited by document-term matrices. Due to this sparsity, several co-clusters are primarily composed of zeros. While homogeneous, these co-clusters are irrelevant and must be filtered out in a post-processing step to keep only the most significant ones. The objective of this thesis is to propose new co-clustering algorithms tailored to take into account these sparsity-related issues. The proposed algorithms seek a block diagonal structure and allow to straightaway identify the most useful co-clusters, which makes them specially effective for the text co-clustering task. Our contributions can be summarized as follows: First, we introduce and demonstrate the effectiveness of a novel co-clustering algorithm based on a direct maximization of graph modularity. While existing graph-based co-clustering algorithms rely on spectral relaxation, the proposed algorithm uses an iterative alternating optimization procedure to reveal the most meaningful co-clusters in a document-term matrix. Moreover, the proposed optimization has the advantage of avoiding the computation of eigenvectors, a task which is prohibitive when considering high dimensional data. This is an improvement over spectral approaches, where the eigenvectors computation is necessary to perform the co-clustering. Second, we use an even more powerful approach to discover block diagonal structures in document-term matrices. We rely on mixture models, which offer strong theoretical foundations and considerable flexibility that makes it possible to uncover various specific cluster structure. More precisely, we propose a rigorous probabilistic model based on the Poisson distribution and the well known Latent Block Model. Interestingly, this model includes the sparsity in its formulation, which makes it particularly effective for text data. Setting the estimate of this model’s parameters under the Maximum Likelihood (ML) and the Classification Maximum Likelihood (CML) approaches, four co-clustering algorithms have been proposed, including a hard, a soft, a stochastic and a fourth algorithm which leverages the benefits of both the soft and stochastic variants, simultaneously. As a last contribution of this thesis, we propose a new biomedical text mining framework that includes some of the above mentioned co-clustering algorithms. This work shows the contribution of co-clustering in a real biomedical text mining problematic. The proposed framework is able to propose new clues about the results of genome wide association studies (GWAS) by mining PUBMED abstracts. This framework has been tested on asthma disease and allowed to assess the strength of associations between asthma genes reported in previous GWAS as well as discover new candidate genes likely associated to asthma. In a nutshell, while several text co-clustering algorithms already exist, their performance can be substantially increased if more appropriate models and algorithms are available. According to the extensive experiments done on several challenging real-world text data sets, we believe that this thesis has served well this objective
Tekieh, Mohammad Hossein. "Analysis of Healthcare Coverage Using Data Mining Techniques." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/20547.
Повний текст джерелаRastelli, Riccardo, and Nial Friel. "Optimal Bayesian estimators for latent variable cluster models." Springer Nature, 2018. http://dx.doi.org/10.1007/s11222-017-9786-y.
Повний текст джерелаMajed, Aliah. "Sensing-based self-reconfigurable strategies for autonomous modular robotic systems." Electronic Thesis or Diss., Brest, École nationale supérieure de techniques avancées Bretagne, 2022. http://www.theses.fr/2022ENTA0013.
Повний текст джерелаModular robotic systems (MRSs) have become a highly active research today. It has the ability to change the perspective of robotic systems from machines designed to do certain tasks to multipurpose tools capable of accomplishing almost any task. They are used in a wide range of applications, including reconnaissance, rescue missions, space exploration, military task, etc. Constantly, MRS is built of “modules” from a few to several hundreds or even thousands. Each module involves actuators, sensors, computational, and communicational capabilities. Usually, these systems are homogeneous where all the modules are identical; however, there could be heterogeneous systems that contain different modules to maximize versatility. One of the advantages of these systems is their ability to operate in harsh environments in which contemporary human-in-the-loop working schemes are risky, inefficient and sometimes infeasible. In this thesis, we are interested in self-reconfigurable modular robotics. In such systems, it uses a set of detectors in order to continuously sense its surroundings, locate its own position, and then transform to a specific shape to perform the required tasks. Consequently, MRS faces three major challenges. First, it offers a great amount of collected data that overloads the memory storage of the robot. Second it generates redundant data which complicates the decision making about the next morphology in the controller. Third, the self reconfiguration process necessitates massive communication between the modules to reach the target morphology and takes a significant processing time to self-reconfigure the robotic. Therefore, researchers’ strategies are often targeted to minimize the amount of data collected by the modules without considerable loss in fidelity. The goal of this reduction is first to save the storage space in the MRS, and then to facilitate analyzing data and making decision about what morphology to use next in order to adapt to new circumstances and perform new tasks. In this thesis, we propose an efficient mechanism for data processing and self-reconfigurable decision-making dedicated to modular robotic systems. More specifically, we focus on data storage reduction, self-reconfiguration decision-making, and efficient communication management between modules in MRSs with the main goal of ensuring fast self-reconfiguration process
Kulhanek, Raymond Daniel. "A Latent Dirichlet Allocation/N-gram Composite Language Model." Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1379520876.
Повний текст джерелаGallopin, Mélina. "Classification et inférence de réseaux pour les données RNA-seq." Thesis, Université Paris-Saclay (ComUE), 2015. http://www.theses.fr/2015SACLS174/document.
Повний текст джерелаThis thesis gathers methodologicals contributions to the statistical analysis of next-generation high-throughput transcriptome sequencing data (RNA-seq). RNA-seq data are discrete and the number of samples sequenced is usually small due to the cost of the technology. These two points are the main statistical challenges for modelling RNA-seq data.The first part of the thesis is dedicated to the co-expression analysis of RNA-seq data using model-based clustering. A natural model for discrete RNA-seq data is a Poisson mixture model. However, a Gaussian mixture model in conjunction with a simple transformation applied to the data is a reasonable alternative. We propose to compare the two alternatives using a data-driven criterion to select the model that best fits each dataset. In addition, we present a model selection criterion to take into account external gene annotations. This model selection criterion is not specific to RNA-seq data. It is useful in any co-expression analysis using model-based clustering designed to enrich functional annotation databases.The second part of the thesis is dedicated to network inference using graphical models. The aim of network inference is to detect relationships among genes based on their expression. We propose a network inference model based on a Poisson distribution taking into account the discrete nature and high inter sample variability of RNA-seq data. However, network inference methods require a large number of samples. For Gaussian graphical models, we propose a non-asymptotic approach to detect relevant subsets of genes based on a block-diagonale decomposition of the covariance matrix. This method is not specific to RNA-seq data and reduces the dimension of any network inference problem based on the Gaussian graphical model
El, Assaad Hani. "Modélisation et classification dynamique de données temporelles non stationnaires." Thesis, Paris Est, 2014. http://www.theses.fr/2014PEST1162/document.
Повний текст джерелаNowadays, diagnosis and monitoring for predictive maintenance of railway components are important key subjects for both operators and manufacturers. They seek to anticipate upcoming maintenance actions, reduce maintenance costs and increase the availability of rail network. In order to maintain the components at a satisfactory level of operation, the implementation of reliable diagnostic strategy is required. In this thesis, we are interested in a main component of railway infrastructure, the railway switch; an important safety device whose failure could heavily impact the availability of the transportation system. The diagnosis of this system is therefore essential and can be done by exploiting sequential measurements acquired successively while the state of the system is evolving over time. These measurements consist of power consumption curves that are acquired during several switch operations. The shape of these curves is indicative of the operating state of the system. The aim is to track the temporal dynamic evolution of railway component state under different operating contexts by analyzing the specific data in order to detect and diagnose problems that may lead to functioning failure. This thesis tackles the problem of temporal data clustering within a broader context of developing innovative tools and decision-aid methods. We propose a new dynamic probabilistic approach within a temporal data clustering framework. This approach is based on both Gaussian mixture models and state-space models. The main challenge facing this work is the estimation of model parameters associated with this approach because of its complex structure. In order to meet this challenge, a variational approach has been developed. The results obtained on both synthetic and real data highlight the advantage of the proposed algorithms compared to other state of the art methods in terms of clustering and estimation accuracy
Haider, Peter. "Prediction with Mixture Models." Phd thesis, Universität Potsdam, 2013. http://opus.kobv.de/ubp/volltexte/2014/6961/.
Повний текст джерелаDas Lernen eines Modells für den Zusammenhang zwischen den Eingabeattributen und annotierten Zielattributen von Dateninstanzen dient zwei Zwecken. Einerseits ermöglicht es die Vorhersage des Zielattributs für Instanzen ohne Annotation. Andererseits können die Parameter des Modells nützliche Einsichten in die Struktur der Daten liefern. Wenn die Daten eine inhärente Partitionsstruktur besitzen, ist es natürlich, diese Struktur im Modell widerzuspiegeln. Solche Mischmodelle generieren Vorhersagen, indem sie die individuellen Vorhersagen der Mischkomponenten, welche mit den Partitionen der Daten korrespondieren, kombinieren. Oft ist die Partitionsstruktur latent und muss beim Lernen des Mischmodells mitinferiert werden. Eine direkte Evaluierung der Genauigkeit der inferierten Partitionsstruktur ist in vielen Fällen unmöglich, weil keine wahren Referenzdaten zum Vergleich herangezogen werden können. Jedoch kann man sie indirekt einschätzen, indem man die Vorhersagegenauigkeit des darauf basierenden Mischmodells misst. Diese Arbeit beschäftigt sich mit dem Zusammenspiel zwischen der Verbesserung der Vorhersagegenauigkeit durch das Aufdecken latenter Partitionierungen in Daten, und der Bewertung der geschätzen Struktur durch das Messen der Genauigkeit des resultierenden Vorhersagemodells. Bei der Anwendung des Filterns unerwünschter E-Mails sind die E-Mails in der Trainingsmende latent in Werbekampagnen partitioniert. Das Aufdecken dieser latenten Struktur erlaubt das Filtern zukünftiger E-Mails mit sehr niedrigen Falsch-Positiv-Raten. In dieser Arbeit wird ein Bayes'sches Partitionierunsmodell entwickelt, um diese Partitionierungsstruktur zu modellieren. Das Wissen über die Partitionierung von E-Mails in Kampagnen hilft auch dabei herauszufinden, welche E-Mails auf Veranlassen des selben Netzes von infiltrierten Rechnern, sogenannten Botnetzen, verschickt wurden. Dies ist eine weitere Schicht latenter Partitionierung. Diese latente Struktur aufzudecken erlaubt es, die Genauigkeit von E-Mail-Filtern zu erhöhen und sich effektiv gegen verteilte Denial-of-Service-Angriffe zu verteidigen. Zu diesem Zweck wird in dieser Arbeit ein diskriminatives Partitionierungsmodell hergeleitet, welches auf dem Graphen der beobachteten E-Mails basiert. Die mit diesem Modell inferierten Partitionierungen werden via ihrer Leistungsfähigkeit bei der Vorhersage der Kampagnen neuer E-Mails evaluiert. Weiterhin kann bei der Klassifikation des Inhalts einer E-Mail statistische Information über den sendenden Server wertvoll sein. Ein Modell zu lernen das diese Informationen nutzen kann erfordert Trainingsdaten, die Serverstatistiken enthalten. Um zusätzlich Trainingsdaten benutzen zu können, bei denen die Serverstatistiken fehlen, wird ein Modell entwickelt, das eine Mischung über potentiell alle Einsetzungen davon ist. Eine weitere Anwendung ist die Vorhersage des Navigationsverhaltens von Benutzern einer Webseite. Hier gibt es nicht a priori eine Partitionierung der Benutzer. Jedoch ist es notwendig, eine Partitionierung zu erzeugen, um verschiedene Nutzungsszenarien zu verstehen und verschiedene Layouts dafür zu entwerfen. Der vorgestellte Ansatz optimiert gleichzeitig die Fähigkeiten des Modells, sowohl die beste Partition zu bestimmen als auch mittels dieser Partition Vorhersagen über das Verhalten zu generieren. Jedes Modell wird auf realen Daten evaluiert und mit Referenzmethoden verglichen. Die Ergebnisse zeigen, dass das explizite Modellieren der Annahmen über die latente Partitionierungsstruktur zu verbesserten Vorhersagen führt. In den Fällen bei denen die Vorhersagegenauigkeit nicht direkt optimiert werden kann, erweist sich die Hinzunahme einer kleinen Anzahl von übergeordneten, direkt einstellbaren Parametern als nützlich.
Westerlund, Annie M. "Computational Study of Calmodulin’s Ca2+-dependent Conformational Ensembles." Licentiate thesis, KTH, Biofysik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-234888.
Повний текст джерелаQC 20180912
Dahl, Oskar, and Fredrik Johansson. "Understanding usage of Volvo trucks." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-40826.
Повний текст джерелаThis thesis was later conducted as a scientific paper and was submit- ted to the conference of ICIMP, 2020. The publication was accepted the 23th of September (2019), and will be presented in January, 2020.
Barceló, Rico Fátima. "Multimodel Approaches for Plasma Glucose Estimation in Continuous Glucose Monitoring. Development of New Calibration Algorithms." Doctoral thesis, Universitat Politècnica de València, 2012. http://hdl.handle.net/10251/17173.
Повний текст джерелаBarceló Rico, F. (2012). Multimodel Approaches for Plasma Glucose Estimation in Continuous Glucose Monitoring. Development of New Calibration Algorithms [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17173
Palancia