Log in

Relevant bibliographies by topics / Mixed data types / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Mixed data types.

Dissertations / Theses on the topic 'Mixed data types'

Author: Grafiati

Published: 9 March 2023

Last updated: 11 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 33 dissertations / theses for your research on the topic 'Mixed data types.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Koomson, Obed. "Performance Assessment of The Extended Gower Coefficient on Mixed Data with Varying Types of Functional Data." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/etd/3512.

Full text

Abstract:

Clustering is a widely used technique in data mining applications to source, manage, analyze and extract vital information from large amounts of data. Most clustering procedures are limited in their performance when it comes to data with mixed attributes. In recent times, mixed data have evolved to include directional and functional data. In this study, we will give an introduction to clustering with an eye towards the application of the extended Gower coefficient by Hendrickson (2014). We will conduct a simulation study to assess the performance of this coefficient on mixed data whose functional component has strictly-decreasing signal curves and also those whose functional component has a mixture of strictly-decreasing signal curves and periodic tendencies. We will assess how four different hierarchical clustering algorithms perform on mixed data simulated under varying conditions with and without weights. The comparison of the various clustering solutions will be done using the Rand Index.

APA, Harvard, Vancouver, ISO, and other styles

2

Apitzsch, Cecilia, and Josefin Ryeng. "Cluster Analysis of Mixed Data Types in Credit Risk : A study of clustering algorithms to detect customer segments." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-172594.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

GIORDANI, ILARIA. "Relational clustering for knowledge discovery in life sciences." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2010. http://hdl.handle.net/10281/7830.

Full text

Abstract:

Clustering is one of the most common machines learning technique, which has been widely applied in genomics, proteomics and more generally in Life Sciences. In particular, clustering is an unsupervised technique that, based on geometric concepts like distance or similarity, partitions objects into groups, such that objects with similar characteristics are clustered together and dissimilar objects are in different clusters. In many domains where clustering is applied, some background knowledge is available in different forms: labelled data (specifying the category to which an instance belongs); complementary information about "true" similarity between pairs of objects or about the relationships structure present in the input data; user preferences (for example specifying whether two instances should be in same or different clusters). In particular, in many real-world applications like biological data processing, social network analysis and text mining, data do not exist in isolation, but a rich structure of relationships subsists between them. A simple example can be viewed in biological domain, where there are al lot of relationships between genes and proteins based on many experimental conditions. Another example, maybe common, is the Web search domain where there are relations between documents and words in a text or web pages, search queries and web users. Our research is focalized on how this background knowledge can be incorporated into traditional clustering algorithms to optimize the process of pattern discovery (clustering) between instances. In this thesis, we first provide an overview of traditional clustering methods with some important distance measures and then we analyze three particular challenges that we try to overcome with different proposed methods: "feature selection" to reduce high dimensional input space and remove noise from data; "mixed data types" to handle in clustering procedure both numeric and categorical values, typically of life science applications; finally, "knowledge integration" in order to improve the semantic value of clustering incorporating the background knowledge. Regarding the first challenge we propose a novel approach based on using of genetic programming, an evolutionary algorithm-based methodology, in order to automatically perform feature selection. Different clustering algorithms are been investigated regarding the second challenge. A modify version of a particular algorithm is proposed and applied to clinical data. Particularly attention is given to the final challenge, the most important objective of this Thesis: the development of a new relational clustering framework in order to improve the semantic value of clustering taking into account in the clustering algorithm relationships learned from background knowledge. We investigate and classify existing clustering methods into two principal categories: - Structure driven approaches: that are bound to data structure. The data clustering problem is tackled from several dimensions: clustering concurrently columns and rows of a given dataset, like biclustering algorithm or vertical 3-D clustering. - Knowledge driven approaches: where domain information is used to drive the clustering process and interpret its results: semi-supervised clustering, that using both labelled and unlabeled data, has attracted significant attention. This kind of clustering algorithms represents the first step to implement the proposed general framework that it is classified into this category. In particular the thesis focuses on the development of a general framework for relational clustering instantiating it for three different life science applications: the first one with the aim of finding groups of gene with similar behaviour respect to their expression and regulatory profile. The second one is a pharmacogenomics application, in which the relational clustering framework is applied on a benchmark dataset (NCI60) to identify a drug treatment to a given cell line based both on drug activity pattern and gene expression profile. Finally, the proposed framework is applied on clinical data: a particular dataset containing different information about patients in anticoagulant therapy has been analyzed to find group of patients with similar behaviour and responses to the therapy.

APA, Harvard, Vancouver, ISO, and other styles

4

Abouzeid, Shadi. "A visual interactive grouping analysis tool (VIGAT) that takes mixed data types as input and provides visually interactive overlapping groups as output." Thesis, University of Strathclyde, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.401309.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Sun, Jinhui. "Robust Feature Screening Procedures for Mixed Type of Data." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/73709.

Full text

Abstract:

High dimensional data have been frequently collected in many fields of scientific research and technological development. The traditional idea of best subset selection methods, which use penalized L_0 regularization, is computationally too expensive for many modern statistical applications. A large number of variable selection approaches via various forms of penalized least squares or likelihood have been developed to select significant variables and estimate their effects simultaneously in high dimensional statistical inference. However, in modern applications in areas such as genomics and proteomics, ultra-high dimensional data are often collected, where the dimension of data may grow exponentially with the sample size. In such problems, the regularization methods can become computationally unstable or even infeasible. To deal with the ultra-high dimensionality, Fan and Lv (2008) proposed a variable screening procedure via correlation learning to reduce dimensionality in sparse ultra-high dimensional models. Since then many authors further developed the procedure and applied to various statistical models. However, they all focused on single type of predictors, that is, the predictors are either all continuous or all discrete. In practice, we often collect mixed type of data, which contains both continuous and discrete predictors. For example, in genetic studies, we can collect information on both gene expression profiles and single nucleotide polymorphism (SNP) genotypes. Furthermore, outliers are often present in the observations due to experimental errors and other reasons. And the true trend underlying the data might not follow the parametric models assumed in many existing screening procedures. Hence a robust screening procedure against outliers and model misspecification is desired. In my dissertation, I shall propose a robust feature screening procedure for mixed type of data. To gain insights on screening for individual types of data, I first studied feature screening procedures for single type of data in Chapter 2 based on marginal quantities. For each type of data, new feature screening procedures are proposed and simulation studies are performed to compare their performances with existing procedures. The aim is to identify a best robust screening procedure for each type of data. In Chapter 3, I combine these best screening procedures to form the robust feature screening procedure for mixed type of data. Its performance will be assessed by simulation studies. I shall further illustrate the proposed procedure by the analysis of a real example.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

6

Engardt, Sara. "Unsupervised learning with mixed type data : for detecting money laundering." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230891.

Full text

Abstract:

The purpose of this master's thesis is to perform a cluster analysis on parts of Handelsbanken's customer database. The ambition is to explore if this could be of aid in identifying type customers within risk of illegal activities such as money laundering. A literature study is conducted to help determine which of the clustering methods described in the literature are most suitable for the current problem. The most important constraints of the problem are that the data consists of mixed type attributes (categorical and numerical) and the large presence of outliers in the data. An extension to the self-organising map as well as the k-prototypes algorithms were chosen for the clustering. It is concluded that clusters exist in the data, however in the presence of outliers. More work is needed on handling missing values in the dataset.
Syftet med denna masteruppsats är att utföra en klusteranalys på delar av Handelsbankens kunddatabas. Tanken är att undersöka ifall detta kan vara till hjälp i att identifiera typkunder inom olagliga aktiviteter såsom penningtvätt. Först genomförs en litteraturstudie för att undersöka vilken algoritm som är bäst lämpad för att lösa problemet. Kunddatabasen består av data med både numeriska och kategoriska attribut. Ett utökat Kohonen-nätverk (eng: self-organising map) samt k-prototyp algoritmen används för klustringen. Resultaten visar att det finns kluster i datat, men i närvaro av brus. Mer arbete behöver göras för att hantera tomma värden bland attributen.

APA, Harvard, Vancouver, ISO, and other styles

7

Codd, Casey. "A Review and Comparison of Models and Estimation Methods for Multivariate Longitudinal Data of Mixed Scale Type." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398686513.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Koufakou, Anna. "SCALABLE AND EFFICIENT OUTLIER DETECTION IN LARGE DISTRIBUTED DATA SETS WITH MIXED-TYPE ATTRIBUTES." Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3431.

Full text

Abstract:

An important problem that appears often when analyzing data involves identifying irregular or abnormal data points called outliers. This problem broadly arises under two scenarios: when outliers are to be removed from the data before analysis, and when useful information or knowledge can be extracted by the outliers themselves. Outlier Detection in the context of the second scenario is a research field that has attracted significant attention in a broad range of useful applications. For example, in credit card transaction data, outliers might indicate potential fraud; in network traffic data, outliers might represent potential intrusion attempts. The basis of deciding if a data point is an outlier is often some measure or notion of dissimilarity between the data point under consideration and the rest. Traditional outlier detection methods assume numerical or ordinal data, and compute pair-wise distances between data points. However, the notion of distance or similarity for categorical data is more difficult to define. Moreover, the size of currently available data sets dictates the need for fast and scalable outlier detection methods, thus precluding distance computations. Additionally, these methods must be applicable to data which might be distributed among different locations. In this work, we propose novel strategies to efficiently deal with large distributed data containing mixed-type attributes. Specifically, we first propose a fast and scalable algorithm for categorical data (AVF), and its parallel version based on MapReduce (MR-AVF). We extend AVF and introduce a fast outlier detection algorithm for large distributed data with mixed-type attributes (ODMAD). Finally, we modify ODMAD in order to deal with very high-dimensional categorical data. Experiments with large real-world and synthetic data show that the proposed methods exhibit large performance gains and high scalability compared to the state-of-the-art, while achieving similar accuracy detection rates.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering PhD

APA, Harvard, Vancouver, ISO, and other styles

9

Chu, Shuyu. "Change Detection and Analysis of Data with Heterogeneous Structures." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78613.

Full text

Abstract:

Heterogeneous data with different characteristics are ubiquitous in the modern digital world. For example, the observations collected from a process may change on its mean or variance. In numerous applications, data are often of mixed types including both discrete and continuous variables. Heterogeneity also commonly arises in data when underlying models vary across different segments. Besides, the underlying pattern of data may change in different dimensions, such as in time and space. The diversity of heterogeneous data structures makes statistical modeling and analysis challenging. Detection of change-points in heterogeneous data has attracted great attention from a variety of application areas, such as quality control in manufacturing, protest event detection in social science, purchase likelihood prediction in business analytics, and organ state change in the biomedical engineering. However, due to the extraordinary diversity of the heterogeneous data structures and complexity of the underlying dynamic patterns, the change-detection and analysis of such data is quite challenging. This dissertation aims to develop novel statistical modeling methodologies to analyze four types of heterogeneous data and to find change-points efficiently. The proposed approaches have been applied to solve real-world problems and can be potentially applied to a broad range of areas.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

10

Wahi, Rabbani Rash-ha. "Towards an understanding of the factors associated with severe injuries to cyclists in crashes with motor vehicles." Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/121426/1/Rabbani%20Rash-Ha_Wahi_Thesis.pdf.

Full text

Abstract:

This thesis aimed to develop statistical models to overcome limitations in police-reported data to better understand the factors contributing to severe injuries in bicycle motor-vehicle crashes. In low-cycling countries such as Australia, collisions with motor vehicles are the major causes of severe injuries to cyclists and fear of collisions prevents many people from taking up cycling. The empirical results obtained from the models provide valuable insights to assist transport and enforcement agencies to improve cyclist safety.

APA, Harvard, Vancouver, ISO, and other styles

11

Chevallier, Juliette. "Statistical models and stochastic algorithms for the analysis of longitudinal Riemanian manifold valued data with multiple dynamic." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLX059/document.

Full text

Abstract:

Par delà les études transversales, étudier l'évolution temporelle de phénomènes connait un intérêt croissant. En effet, pour comprendre un phénomène, il semble plus adapté de comparer l'évolution des marqueurs de celui-ci au cours du temps plutôt que ceux-ci à un stade donné. Le suivi de maladies neuro-dégénératives s'effectue par exemple par le suivi de scores cognitifs au cours du temps. C'est également le cas pour le suivi de chimiothérapie : plus que par l'aspect ou le volume des tumeurs, les oncologues jugent que le traitement engagé est efficace dès lors qu'il induit une diminution du volume tumoral.L'étude de données longitudinales n'est pas cantonnée aux applications médicales et s'avère fructueuse dans des cadres d'applications variés tels que la vision par ordinateur, la détection automatique d'émotions sur un visage, les sciences sociales, etc.Les modèles à effets mixtes ont prouvé leur efficacité dans l'étude des données longitudinales, notamment dans le cadre d'applications médicales. Des travaux récent (Schiratti et al., 2015, 2017) ont permis l'étude de données complexes, telles que des données anatomiques. L'idée sous-jacente est de modéliser la progression temporelle d'un phénomène par des trajectoires continues dans un espace de mesures, que l'on suppose être une variété riemannienne. Sont alors estimées conjointement une trajectoire moyenne représentative de l'évolution globale de la population, à l'échelle macroscopique, et la variabilité inter-individuelle. Cependant, ces travaux supposent une progression unidirectionnelle et échouent à décrire des situations telles que la sclérose en plaques ou le suivi de chimiothérapie. En effet, pour ces pathologies, vont se succéder des phases de progression, de stabilisation et de remision de la maladie, induisant un changement de la dynamique d'évolution globale.Le but de cette thèse est de développer des outils méthodologiques et algorithmiques pour l’analyse de données longitudinales, dans le cas de phénomènes dont la dynamique d'évolution est multiple et d'appliquer ces nouveaux outils pour le suivi de chimiothérapie. Nous proposons un modèle non-linéaire à effets mixtes dans lequel les trajectoires d'évolution individuelles sont vues comme des déformations spatio-temporelles d'une trajectoire géodésique par morceaux et représentative de l'évolution de la population. Nous présentons ce modèle sous des hypothèses très génériques afin d'englober une grande classe de modèles plus spécifiques.L'estimation des paramètres du modèle géométrique est réalisée par un estimateur du maximum a posteriori dont nous démontrons l'existence et la consistance sous des hypothèses standards. Numériquement, du fait de la non-linéarité de notre modèle, l'estimation est réalisée par une approximation stochastique de l'algorithme EM, couplée à une méthode de Monte-Carlo par chaînes de Markov (MCMC-SAEM). La convergence du SAEM vers les maxima locaux de la vraisemblance observée ainsi que son efficacité numérique ont été démontrées. En dépit de cette performance, l'algorithme SAEM est très sensible à ses conditions initiales. Afin de palier ce problème, nous proposons une nouvelle classe d'algorithmes SAEM dont nous démontrons la convergence vers des minima locaux. Cette classe repose sur la simulation par une loi approchée de la vraie loi conditionnelle dans l'étape de simulation. Enfin, en se basant sur des techniques de recuit simulé, nous proposons une version tempérée de l'algorithme SAEM afin de favoriser sa convergence vers des minima globaux
Beyond transversal studies, temporal evolution of phenomena is a field of growing interest. For the purpose of understanding a phenomenon, it appears more suitable to compare the evolution of its markers over time than to do so at a given stage. The follow-up of neurodegenerative disorders is carried out via the monitoring of cognitive scores over time. The same applies for chemotherapy monitoring: rather than tumors aspect or size, oncologists asses that a given treatment is efficient from the moment it results in a decrease of tumor volume. The study of longitudinal data is not restricted to medical applications and proves successful in various fields of application such as computer vision, automatic detection of facial emotions, social sciences, etc.Mixed effects models have proved their efficiency in the study of longitudinal data sets, especially for medical purposes. Recent works (Schiratti et al., 2015, 2017) allowed the study of complex data, such as anatomical data. The underlying idea is to model the temporal progression of a given phenomenon by continuous trajectories in a space of measurements, which is assumed to be a Riemannian manifold. Then, both a group-representative trajectory and inter-individual variability are estimated. However, these works assume an unidirectional dynamic and fail to encompass situations like multiple sclerosis or chemotherapy monitoring. Indeed, such diseases follow a chronic course, with phases of worsening, stabilization and improvement, inducing changes in the global dynamic.The thesis is devoted to the development of methodological tools and algorithms suited for the analysis of longitudinal data arising from phenomena that undergo multiple dynamics and to apply them to chemotherapy monitoring. We propose a nonlinear mixed effects model which allows to estimate a representative piecewise-geodesic trajectory of the global progression and together with spacial and temporal inter-individual variability. Particular attention is paid to estimation of the correlation between the different phases of the evolution. This model provides a generic and coherent framework for studying longitudinal manifold-valued data.Estimation is formulated as a well-defined maximum a posteriori problem which we prove to be consistent under mild assumptions. Numerically, due to the non-linearity of the proposed model, the estimation of the parameters is performed through a stochastic version of the EM algorithm, namely the Markov chain Monte-Carlo stochastic approximation EM (MCMC-SAEM). The convergence of the SAEM algorithm toward local maxima of the observed likelihood has been proved and its numerical efficiency has been demonstrated. However, despite appealing features, the limit position of this algorithm can strongly depend on its starting position. To cope with this issue, we propose a new version of the SAEM in which we do not sample from the exact distribution in the expectation phase of the procedure. We first prove the convergence of this algorithm toward local maxima of the observed likelihood. Then, with the thought of the simulated annealing, we propose an instantiation of this general procedure to favor convergence toward global maxima: the tempering-SAEM

APA, Harvard, Vancouver, ISO, and other styles

12

Elamin, Obbey Ahmed. "Nonparametric kernel estimation methods for discrete conditional functions in econometrics." Thesis, University of Manchester, 2013. https://www.research.manchester.ac.uk/portal/en/theses/nonparametric-kernel-estimation-methods-for-discrete-conditional-functions-in-econometrics(d443e56a-dfb8-4f23-bfbe-ec98ecac030b).html.

Full text

Abstract:

This thesis studies the mixed data types kernel estimation framework for the models of discrete dependent variables, which are known as kernel discrete conditional functions. The conventional parametric multinomial logit MNL model is compared with the mixed data types kernel conditional density estimator in Chapter (2). A new kernel estimator for discrete time single state hazard models is developed in Chapter (3), and named as the discrete time “external kernel hazard” estimator. The discrete time (mixed) proportional hazard estimators are then compared with the discrete time external kernel hazard estimator empirically in Chapter (4). The work in Chapter (2) attempts to estimate a labour force participation decision model using a cross-section data from the UK labour force survey in 2007. The work in Chapter (4) estimates a hazard rate for job-vacancies in weeks, using data from Lancashire Careers Service (LCS) between the period from March 1988 to June 1992. The evidences from the vast literature regarding female labour force participation and the job-market random matching theory are used to examine the empirical results of the estimators. The parametric estimator are tighten by the restrictive assumption regarding the link function of the discrete dependent variable and the dummy variables of the discrete covariates. Adding interaction terms improves the performance of the parametric models but encounters other risks like generating multicollinearity problem, increasing the singularity of the data matrix and complicates the computation of the ML function. On the other hand, the mixed data types kernel estimation framework shows an outstanding performance compared with the conventional parametric estimation methods. The kernel functions that are used for the discrete variables, including the dependent variable, in the mixed data types estimation framework, have substantially improved the performance of the kernel estimators. The kernel framework uses very few assumptions about the functional form of the variables in the model, and relay on the right choice of the kernel functions in the estimator. The outcomes of the kernel conditional density shows that female education level and fertility have high impact on females propensity to work and be in the labour force. The kernel conditional density estimator captures more heterogeneity among the females in the sample than the MNL model due to the restrictive parametric assumptions in the later. The (mixed) proportional hazard framework, on the other hand, missed to capture the effect of the job-market tightness in the job-vacancies hazard rate and produce inconsistent results when the assumptions regarding the distribution of the unobserved heterogeneity are changed. The external kernel hazard estimator overcomes those problems and produce results that consistent with the job market random matching theory. The results in this thesis are useful for nonparametric estimation research in econometrics and in labour economics research.

APA, Harvard, Vancouver, ISO, and other styles

13

Bushel, Pierre Robert. "Clustering of mixed data types with application to toxicogenomics." 2005. http://www.lib.ncsu.edu/theses/available/etd-03172005-091928/unrestricted/etd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Huang, Pei-Yuan, and 黃沛媛. "Fuzzy Clustering Algorithms for the Mixed Types of Data." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/12616104324570956371.

Full text

Abstract:

碩士
中原大學
數學研究所
89
There are several methods for clustering of data, such as divisive , hierarchical, k-means, and fuzzy c-means methods, etc. However ,these methods are must used for numerical data. There are few documents dealing with mixed types of numerical , symbolic and fuzzy data. This thesis presents fuzzy clustering algorithms for the mixed type of data (i.e., composed of numerical, symbolic, and fuzzy data) by adopting fuzzy c-means (FCM) [4]. It is mainly based on the definition of symbolic distance proposed by Diday [3] and Gowda & Diday [5,6], and on also the definition of fuzzy distance proposed by Hathaway & Bezdek [8]. The fact that these two distances come against on intuition is found during the process. Therefore, an appropriate amendment is made, and better results are given. At last, a real example is given. The mixed type of data is included. By adopting the method proposed in this thesis, good results are generated. That is, the proposed method can be adopted to classify both well mixed type and individual type of data.

APA, Harvard, Vancouver, ISO, and other styles

15

Hancock, Timothy Peter. "Multivariate consensus trees: tree-based clustering and profiling for mixed data types." Thesis, 2006. https://researchonline.jcu.edu.au/17497/1/01front.pdf.

Full text

Abstract:

Multivariate profiling aims to find groups in a response dataset that are described by relationships with another. Profiling is not predicting each variable within the response set, but finding stable relationships between the two datasets that define common groups. Profiling styles of analysis arise commonly within the context of survey, experimental design and diagnosis type of studies. These studies produce complex multivariate datasets that contain mixed variables often with missing values that require analysis with a flexible, stable statistical technique. The profiling model under consideration within this thesis is a Classification and Regression Tree (CART). A standard CART model finds groups within a univariate response by building a decision tree from a set of predictor variables. The flexible structure of a CART model allow it to be used for either discriminate or regression analysis whilst also catering for mixed types within the predictor set. The goal of this thesis to develop methods that extend CART for a multivariate response dataset involving mixed data types. Multivariate regression for CART (MRT) has recently been shown to be a powerful profiling and clustering tool. However the same successes in extending CART for multivariate classification and multivariate mixed type analysis is yet to be realised. To begin with thesis explores simple extensions to CART for multivariate mixed type analysis. These are binary substitution of categorical variables within the response set and partitioning of a distance matrix using Db-MRT. These techniques use already existing extensions to CART methods and are used as comparison methods to gauge the performance of the ensemble and consensus approaches that are the focus of this thesis. Ensemble models using CART, such as random forests and treeboost, not only improve the overall accuracy of the model predictions but also introduce an ensemble proximity matrix as a measure of similarity between observations of the response set. In this thesis, through MRT, extensions to both random forests and treeboost are developed such that they predict a multivariate response. Furthermore, by binary substitution of the categorical variables within the response set these multivariate ensemble techniques are further extended to mixed type profiling. A result of this extension is that the ensemble proximity matrix now describes the groups found within the multivariate response. In this way multivariate tree-base ensembles can be interpreted as a cluster ensemble method, where the ensemble proximity matrices can be seen as cluster ensemble consensus matrices. In this thesis these proximity matrices are found to be powerful visualisation tools providing improved resolution of group structure found by a multivariate ensemble method. More so, as in cluster ensembles using these matrices as an input in to a clustering method improves the accuracy of the groups found. The main work of this thesis is the development of the Multivariate Consensus Tree (MCT) framework for mixed type profiling. Motivating the MCT approach is the need to further understand which variables relate to the groups observed within the proximity matrix. To do this MCTs describe three methods to intelligently combine the ensemble proximity matrices of individual responses into one overall consensus matrix. This consensus matrix is a summary of the overall group structure within each individual proximity matrix. As MCTs work solely with proximity matrices they are independent of the data types within the variables of the response set. Furthermore as each response variable is explicitly predicted it is possible to assess the quality of each proximity matrix in terms of predictive accuracy of the corresponding ensemble. The MCT consensus matrix is a visualisation tool for the groups present within both the response and predictor datasets. As a consensus matrix is a similarity matrix this thesis proposes five new splitting criteria for tree-based models that search for decision rules within variables of the predictor set that partition the consensus matrix into the observed groups. This tree provides a logical decision path that predicts each group. As the groups within the response are now defined by their relationships within the predictor set, the MCT profiling is complete. This thesis proposes two algorithms for building an MCT; global MCTs and local MCTs. Global MCTs construct an overall consensus matrix spanning all observations, and recursively partition on this matrix to build the tree. Local MCTs build a new consensus matrix at each terminal node to evaluate each new split. As MCTs have the proximity matrices to summarise the group structure within each response variable methods to identify important subgroups within these variables are also proposed. This search for subgroups within the response can be done on two levels. Firstly to identify subgroups of response variables for overall analysis; and secondly to identify subsets of response variables within any specific group found by the MCT. By finding subsets of response variables that relate to specific group structure the understanding of structure within the dataset is greatly improved. This thesis shows tree-based methods for profiling, in particular MCTs, to be a powerful tool for mixed type analysis. Firstly, the visualisation of the tree, combined with the proximity matrices, provide a unique view of the groups found and allow for their easy interpretation within the context of the analysis. Secondly, MCTs are shown to accurately estimate the number of groups and provide measures on their stability and accuracy. Furthermore, MCTs are found to be resistant to noise variables within the analysis. Finally they provide methods to find subgroups within the response variables and to identify unimportant variables from the analysis. Throughout this thesis these tree-based methods are compared with standard clustering techniques to provide an accurate benchmark for their performance.

APA, Harvard, Vancouver, ISO, and other styles

16

Ching, Billy K. S. "Analysis of longitudinal data of mixed types using a state space model approach." Thesis, 1997. http://hdl.handle.net/2429/6389.

Full text

Abstract:

A new method for multivariate regression analysis of longitudinal data of mixed types is applied to the data from a sub-study of the Betaseron multicenter clinical trial in relapsing-remitting multiple sclerosis (MS) (The IFNB Multiple Sclerosis Study Group, 1993). The sub-study is based on a cohort of 52 patients at one center (University of British Columbia) for frequent magnetic resonance imagings (MRIs) for analysis of disease activity over the first two years of the trial (Paty, Li, the UBC MS/MRI Study Group and the IFNB Multiple Sclerosis Study Group, 1993). We consider a bivariate response vector with two different data types as components. The first component is a positive continuous variable and the second one is a count variable. We use a state space model approach based on the Tweedie class of exponential dispersion models assuming conditional independence of the two components given a latent gamma Markov process. The latent process is interpreted as the underlying severity of the disease whereas the observations reflect the symptoms. One advantage the new method offers is that it enables the examination of patterns over time. Not only can it identify the presence of treatment effect, but also the nature of the effect. It has well been established that Betaseron has substantially altered the natural history of MS in a properly controlled clinical trial (The IFNB Multiple Sclerosis Study Group, 1993). The main objective of this thesis is to illustrate the utilization of the new method using this data set and to extract additional valuable information from the data.

APA, Harvard, Vancouver, ISO, and other styles

17

Wang, Jiang-Shan, and 王江山. "Improved Learning Vector Quantization for Mixed-Type Data." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/77276310988050953362.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系碩士班
99
With the rapid growth of electronic business, each enterprise has a large amount of electronic data, such as information of customers, information of transactions, etc. most of the data which owned by companies nowadays includes categorical data and numeric data. Learning Vector Quantization (LVQ) is a classification technique which can deal with a large amount of data. It is suitable to serve enterprises for data exploration. Traditional LVQ can’t directly handle categorical data, it requires conversion. A typical conversion is 1-of-k. However, after the conversion, categorical data can’t keep its original structure, which results in some classified errors. In this study, we propose an improved LVQ (ILVQ) which integrates distance hierarchy for handling mixed-type data. Experiments on synthetic and real-world datasets were conducted, and the results demonstrated the effectiveness of the ILVQ.

APA, Harvard, Vancouver, ISO, and other styles

18

Jiang, Shun-Mao, and 姜順貿. "Calculation of Dissimilarity Matrix for Mixed-type Data." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/3tnd58.

Full text

Abstract:

碩士
國立政治大學
統計學系
106
Clustering is a common method for data mining. It requires the information about the distance between observations. The way to define the distance becomes a big challenge due to the convenience of data collection. Datasets are in more complex structures, such as mixed-type. Two types of problems have arisen: how to measure the distances between categorical variables and how to measure the distances for mixed variables. The current study proposed an algorithm to define the distance of categorical variables by the ability of distinguishing other related variables. On the other hand, for continuous variables, first, variables were normalized and weighted Euclidean distances were calculated. Then, two distances we calculated above were combined to find a final distance. Hierarchical clustering was used to verify the performance of proposed method, through some real-world data compared with the methods of the previous paper. The experiments results showed that the proposed method was comparable with other methods. The overall average performance was the best. The technique can be applied to all types of the data. In addition, by visualizing the proposed distance matrix from the heat maps, we found that the number of cluster patterns were the same as the level of class in the majority of our examples.

APA, Harvard, Vancouver, ISO, and other styles

19

Lin, Shu-Han, and 林書漢. "Apply Extended Self-Organizing Map to Analyze Mixed-Type Data." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/98921067198051589700.

Full text

Abstract:

碩士
雲林科技大學
資訊管理系碩士班
98
Mixed numeric and categorical data are commonly seen in nowadays corporate databases in which precious patterns may be hidden. Analyzing mixed-type data to extract the hidden patterns valuable to decision-making is therefore beneficial and critical for corporations to remain competitive. In addition, visualization facilitates exploration in the early stage of data analysis. In the paper, we present a visualized approach for analyzing multivariate mixed-type data. The proposed framework based on an extended self-organizing map allows visualized data cluster analysis as well as classification. We demonstrate the feasibility of the approach by analyzing two real-world datasets and compare our approach with other existing models to show its advantages.

APA, Harvard, Vancouver, ISO, and other styles

20

Kung, Chien-hao, and 龔建豪. "Apply Distance Learning with ierarchical Tree for Mixed-Type Data." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/66422541067077899584.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系碩士班
101
Data analysis is widely used in fields such as biometric, financial marketing, weather forecast, etc. Expert uses data analysis to extract hidden knowledge in their domain. In real world, data are usually of mixed-type which consists of numerical values and categorical values. However, most of data analysis methods assume that all data are either numeric or categorical. Moreover, categorical data are hard to be handled because the values cannot be calculated directly. In this study, we aim to enhance the performance by improving the way of measuring similarity between categorical values in data analysis. First of all, we combine Co-occurrence between feature values and class (COFC), DIstance Learning for Categorical Attributes (DILCA) and Data-intensive similarity measure for categorical data (DISC) with distance hierarchy to turn categorical values into numerical data. Experiments on synthetic and real datasets are conducted, and results demonstrated effectiveness of our approach.

APA, Harvard, Vancouver, ISO, and other styles

21

Lin, Zih-Hui, and 林姿慧. "Extending Structure Adaptive Self-Organizing Map for Clustering Mixed-type Data." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/03195500277699490423.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系碩士班
95
The self-organizing map is an unsupervised neural network which can project high-dimensional data into two-dimensional map. However, the traditional SOM fixes the structure of the map and can not dynamically expand neurons. When used in classification, the traditional SOM has low accuracy. In addition, the traditional SOM can not appropriately deal with categorical data. The extending structure adaptive self-organizing map (ESASOM) integrates structure adaptive self-organizing map and distance hierarchy so as to possess the ability of both dynamic splitting for improving classification performance and dealing with the categorical data. In order to show the cluster structure among neurons on the trained ESASOM, the similar neurons need to be grouped. In this paper, we propose a scheme for clustering the trained ESASOM which not only can be applied to classification but also reflect the cluster structure of the data. Experimental results demonstrate that the proposed method can help users to identify a set of possible clusterings of the training dataset such that users can choose a clustering from them according to his preference.

APA, Harvard, Vancouver, ISO, and other styles

22

ROCCO, GIORGIA. "Multilevel mixed-type data analysis for validating partitions of scrapie isolates." Doctoral thesis, 2017. http://hdl.handle.net/11573/1095347.

Full text

Abstract:

The dissertation arises from a joint study with the Department of Food Safety and Veterinary Public Health of the Istituto Superiore di Sanità. The aim is to investigate and validate the existence of distinct strains of the scrapie disease taking into account the availability of a priori benchmark partition formulated by researchers. Scrapie of small ruminants is caused by prions, which are unconventional infectious agents of proteinaceous nature a ecting humans and animals. Due to the absence of nucleic acids, which precludes direct analysis of strain variation by molecular methods, the presence of di erent sheep scrapie strains is usually investigated by bioassay in laboratory rodents. Data are collected by an experimental study on scrapie conducted at the Istituto Superiore di Sanità by experimental transmission of scrapie isolates to bank voles. We aim to discuss the validation of a given partition in a statistical classification framework using a multi-step procedure. Firstly, we use unsupervised classification to see how alternative clustering results match researchers’ understanding of the heterogeneity of the isolates. We discuss whether and how clustering results can be eventually exploited to extend the preliminary partition elicited by researchers. Then we motivate the subsequent partition validation based on the predictive performance of several supervised classifiers. Our data-driven approach contains two main methodological original contributions. We advocate the use of partition validation measures to investigate a given benchmark partition: firstly we discuss the issue of how the data can be used to evaluate a preliminary benchmark partition and eventually modify it with statistical results to find a conclusive partition that could be used as a “gold standard” in future studies. Moreover, collected data have a multilevel structure and for each lower-level unit, mixed-type data are available. Each step in the procedure is then adapted to deal with multilevel mixed-type data. We extend distance-based clustering algorithms to deal with multilevel mixed-type data. Whereas in supervised classification we propose a two-step approach to classify the higher-level units starting from the lower-level observations. In this framework, we also need to define an ad-hoc cross validation algorithm.

APA, Harvard, Vancouver, ISO, and other styles

23

Lu, Yu-Ting, and 呂郁婷. "Apply Dimensionality Reduction Technique and Distance Learning for Mixed-type Data Visualization." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/91496866094353828247.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系
103
Data with mixed-type of attributes are common in real-life data mining applications. However, most traditional clustering algorithms are limited to handling data that contain either only numeric or categorical attributes. Moreover, in various domains, dimensionality reduction are important for data analysis and visualization, which transforms high-dimensional data into a meaningful representation of reduced dimensionality, typically a two-dimensional space. In recent years, some algorithms of distance learning which can effectively handle mixed-type data has been proposed. In this work, we propose an approach of dimensionality reduction integrated with distance learning and aim to examine whether performance by processing categorical values with distance learning is better than original dimensionality reduction methods. Experimental results indicate the proposed approach outperforms the traditional method.

APA, Harvard, Vancouver, ISO, and other styles

24

Chuang, Kai-Ting, and 莊凱婷. "Apply BatchGSOM Approach to Improve the Performance for Mixed-type Data Analysis." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/86597505061914820368.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系
103
Analyzing big data to find out valuable and useful information or knowledge in the data is one of the most concerned topics nowadays. Real-world data are complicated that usually consist of different types of attributes such as numeric and categorical attributes. Analyzing mixed-type data is not straightforward. Generalizing Self Organizing Map (GSOM) is an effective tool for the visualization of high-dimensional data, and this model can handle this problem. GSOM calculates the distance between categorical values via distance hierarchy. But the training process of GSOM is like Self Organizing Map (SOM). In the weight updating phase, neurons are updated by one input instance at a time. The instance-by-instance update of weights is time-consuming. In this study, we propose integrating the training process of the batch update algorithm with the GSOM model called BatchGSOM. BatchGSOM model runs faster than the stepwise update algorithm. Experiments on synthetic and real-world datasets were conducted, and we compared performances of BatchGSOM with GSOM. The results demonstrated the effectiveness of the BatchGSOM model. The BatchGSOM process improved the performances for mixed-type dataset.

APA, Harvard, Vancouver, ISO, and other styles

25

Cheng, Fu-Chou, and 鄭富州. "Mixed-type data clustering approach to cell formation in Cellular Manufacturing System." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/kqz8fr.

Full text

Abstract:

碩士
中原大學
應用數學研究所
91
Abstract Group Technology (GT) is a strategy in management. It affects a company on most areas. Its impact on productivity is so important that we can not underestimate it. It is also a manufacturing philosophy in improving the productivity for a manufacturing systems. To implement a GT system successfully, one has to understand its impact on the system performance, the different department functioning and the technologies that can assist the implementation. If it is used well, it can lead to economic benefits and job satisfaction. Cell formation (CF), one of the most important problems faced in designing cellular manufacturing systems, is to involve identifying families of similar parts. A part family is a group of parts presenting similar geometry and requiring a similar production process. Traditional schemes such as the classification on coding and the production flow analysis do not consider uncertainty, symbolic or fuzzy data. In this study, we use a mixed-type data clustering algorithm to cell formation. Some examples are demonstrated by applying the proposed method to the real data.

APA, Harvard, Vancouver, ISO, and other styles

26

Tai, Wei-Shen, and 戴偉勝. "A Growing Self-Organizing Map for Visualization of Multivariate Mixed-Type Data." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/55926465515801425185.

Full text

Abstract:

博士
國立雲林科技大學
資訊管理系博士班
100
Nowadays, abundant multivariate mixed-type data including numeric as well as categorical attributes are ubiquitous in a variety of applications. Therefore, processing and analyzing such mixed-type data has become an important issue in data mining field. Via visualization models, one is able to understand and analyze those relationships between complicated data more effortlessly. Self-Organizing Map (SOM) possesses an effective visualization capability for presenting the characteristics of high-dimensional data on a low-dimensional map. One can efficiently extract valuable information from a large amount of data by means of SOM mapping results. More recently, multitudinous variants of SOM were devised to improve deficiencies occurred in conventional SOMs such as fixed-size map, topological preservation and mixed-type data. To overcome the constraint of fixed-size map, diverse flexible map structures were proposed in many growing SOMs. On the other hand, a varied update function in which the distances of map-space between data were considered was used to enhance topological preservation of projected resultants on a predetermined map. Nevertheless, none of current models offers a plausible solution to solve the foregoing problems simultaneously. In this study, a Growing Mixed-type SOM (GMixSOM) is proposed to overcome the abovementioned deficiencies by integrating a new dynamic structure scheme, visualization-induced update and distance hierarchy in one model. Experimental results demonstrated that the proposed model is a feasible solution to manipulate multivariate mixed-type data and reflect the data-space distance between data on a map with a flexible structure.

APA, Harvard, Vancouver, ISO, and other styles

27

Huang, Wei-Hao, and 黃韋皓. "Apply Distance Hierarchy and Dimensionality Reduction to Classification of Mixed-Type Data." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/03834995704641262756.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系碩士班
100
An integrated dimensionality reduction technique with distance hierarchy which can handle mixed-typed data, reduce dimensionality of the data, and visualize data on a 2D map is proposed. There are two aspects of the integration. First, distance hierarchy (DH) is applied to handle categorical values which are mapped to the DH. In contrast to 1-of-k coding, DH considers semantics inherent in categorical values and therefore topological order in the data can be better preserved. Second, t-SNE is employed to reduce data dimensionality which transforms the data in a high-dimensional space to a low-dimensional space. t-SNE is better than other counterparts in separating classes in the lower dimensional space. We use weighted K-NN to evaluate classification performance of using DH and using 1-of-k coding in the original data space and in the projection space. We demonstrate the superiority of using DH against 1-of-k coding by analyzing two synthetic datasets and five real-world datasets.

APA, Harvard, Vancouver, ISO, and other styles

28

WU, JHEN-WEI, and 吳振維. "Apply Dimensionality Reduction Techniques for High Dimensional Mixed-type Data Visualization and Analysis." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/42rbtw.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系
104
Visualization is a useful technique in data analysis, especially, in the initial stage, data exploration. Since high-dimensional data is not visible, dimensionality reduction techniques are usually used to reduce the data to a lower dimension, say two, for visualization. In previous studies, dimensionality reduction was investigated in the context of numeric datasets. Nevertheless, most of real-world datasets are of mixed-type containing both numeric and categorical attributes. In this case, a conventional approach could neither handle it directly nor output an expected result. In this study, we propose a framework which applies dimensionality reduction with distance learning for high-dimensional mixed-type datasets. We also present a method to compare quality of projection results yielded by different distance learning algorithms. Finally, we propose an approach to extract significant features and visualize patterns from the projection map chosen according to quality measures. Experiments on real-world datasets were conducted to demonstrate feasibility of the proposed approach.

APA, Harvard, Vancouver, ISO, and other styles

29

Wang, Jiali. "Recent developments of copula-based models to handle missing data of mixed-type in multivariate analysis." Phd thesis, 2018. http://hdl.handle.net/1885/163716.

Full text

Abstract:

In this thesis, we propose innovative imputation models to handle missing data of mixed-type. Our imputation models can handle 1) multilevel data sets through random effects; 2) heterogeneity in a population by specifying infinite mixture models; and 3) a large number of variables using graphical lasso methods. Two clinical data sets, a randomised control trial of acute stroke care patients and a survey of menstrual disorder among teenagers, are used for the real data application examples, although we believe that the proposed methods can also be applied to other data sets with similar structures. In Chapter 2, we propose a copula based method to handle missing values in multivariate data of mixed type in multilevel data sets. Building upon the extended rank likelihood approach combined with a multinomial probit model formulation, our model is a latent variable model which is able to capture the relationship among variables of different types as well as accounting for the clustering structure. Our proposed method is evaluated through simulations using both artificial data and the acute stroke data set to compare it with several conventional methods of handling missing data. We conclude that our proposed copula based imputation model for mixed type variables achieves good imputation accuracy and recovery of parameters in some models of interest, and that adding random effects enhances performance when the clustering effect is strong. In Chapter 3, we consider an infinite mixture of elliptical copulas induced by a Dirichlet process mixture to build a flexible copula function as the imputation model. A slice sampling algorithm is used in conjunction with a prior parallel tempering algorithm to sample from the infinite dimensional parameter space and to overcome the mixing issue when sampling from a multimodal distribution. Using simulations, we demonstrate that the infinite mixture copula model provides a better overall fit compared to their single component counterparts, and performs better at capturing tail dependence features of the data. The application of this model is also demonstrated using the acute stroke data set. In Chapter 4, we propose a Gaussian copula model with a graphical lasso prior to analyse the conditional associations among 100+ questions in a study of menstrual disorder among teenagers. Our data come from a large population based study of menstrual disorder in Australian teenagers conducted in 2005 and 2016 respectively. We also compare cohort differences of menstruation over the 11-year interval and use the model to predict girls with a higher risk of developing endometriosis. The model is based on the model proposed in Chapter 2, but with a graphical lasso prior to shrink the elements in the precision matrix of the Gaussian distribution to encourage a sparse graphical structure. The level of shrinkage is adaptable from the strength of the conditional associations among questions in the survey. We find that menstrual disturbance is more pronouncedly reported in 2016 than a decade ago, and the questions in the questionnaire form several clusters with strong associations.

APA, Harvard, Vancouver, ISO, and other styles

30

Karaganis, Milana. "Small Area Estimation for Survey Data: A Hierarchical Bayes Approach." 2009. http://hdl.handle.net/1993/3207.

Full text

Abstract:

Model-based estimation techniques have been widely used in small area estimation. This thesis focuses on the Hierarchical Bayes (HB) estimation techniques in application to small area estimation for survey data. We will study the impact of applying spatial structure to area-specific effects and utilizing a specific generalized linear mixed model in comparison with a traditional Fay-Herriot estimation model. We will also analyze different loss functions with applications to a small area estimation problem and compare estimates obtained under these loss functions. Overall, for the case study under consideration, area-specific geographical effects will be shown to have a significant effect on estimates. As well, using a generalized linear mixed model will prove to be more advantageous than the usual Fay-Herriot model. We will also demonstrate the benefits of using a weighted balanced-type loss function for the purpose of balancing the precision of estimates with their closeness to the direct estimates.

APA, Harvard, Vancouver, ISO, and other styles

31

Koh, Kim Hong. "Type I error rates for multi-group confirmatory maximum likelihood factor analysis with ordinal and mixed item format data : a methodology for construct comparability." Thesis, 2003. http://hdl.handle.net/2429/15975.

Full text

Abstract:

Construct comparability studies are of importance in the context of test validation for psychological and educational measures. The most commonly used scale-level methodology for evaluating construct comparability is the Multi-Group Confirmatory Factor Analysis (MGCFA). More specifically, the use of normal-theory Maximum Likelihood (ML) estimation method and Pearson covariance matrix in MGCFA has become increasingly common in day-to-day research given that the estimation methods for ordinal variables require large sample sizes and are limited to 20-25 items. The thesis investigated the statistical properties of the ML estimation method and Pearson covariance matrix in two commonly found contexts, measures with ordinal response formats (binary and Likert-type items) and measures with mixed item formats (wherein some of the items are binary and the remainder are of ordered polytomous items). Two simulation studies were conducted to reflect data typically found in psychological measures and educational achievement tests, respectively. The results of Study 1 show that the number of scale points does not inflate the empirical Type I error rates of the ML chi-square difference test when the ordinal variables approximate a normal distribution. Rather, increasing skewness lead to the inflation of the empirical Type I error rates. In Study 2, the results indicate that mixed item formats and sample size combinations have no effect on the inflation of the empirical Type I error rates when the item response distributions are, again, approximately normal. Implications of the findings and future studies were discussed and recommendations provided for applied researchers.
Education, Faculty of
Educational and Counselling Psychology, and Special Education (ECPS), Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

32

Martins, Sequeira Ana Micaela. "Global distribution models for whale sharks : assessing occurrence trends of highly migratory marine species." Thesis, 2013. http://hdl.handle.net/2440/81551.

Full text

Abstract:

The processes driving distribution and abundance patterns of highly migratory marine species, such as filter-feeding sharks, remain largely unexplained. The whale shark (Rhincodon typus Smith 1828) is a filter-feeding chondrichthyan that can reach > 18 m in total length, making it the largest extant fish species. Its geographic range has been defined within all tropical and warm temperate waters around the globe. However, even though mitochondrial and microsatellite DNA studies have revealed low genetic differentiation among the three major ocean basins, most studies of the species are focussed on the scale of single aggregations. Our understanding of the species’ ecology is therefore based on only a small proportion of its life stages, such that we cannot yet adequately explain its biology and movement patterns (Chapter I). I present a worldwide conceptual model of possible whale shark migration routes, while suggesting a novel perspective for quantifying the species‘ behaviour and ecology. This model can be used to trim the hypotheses related to whale shark movements and aggregation timings, thereby isolating possible mating and breeding areas that are currently unknown (Chapter II). In the next chapter, I quantify the seasonal suitable habitat availability in the Indian Ocean (ocean basin-scale study) by applying generalised linear, spatial mixed-effects and maximum entropy models to produce maps of whale shark habitat suitability (Chapter III). I then assess the inter-annual variation in known whale shark occurrences to unearth temporal trends in a large area of the Indian Ocean. The results from the Indian Ocean suggest both temporal and spatial variability in the whale sharks occurrence (Chapter IV). Therefore, I applied the same analysis to the Atlantic and Pacific Oceans using similar broad-scale datasets. While the results for the Pacific Ocean were inconclusive with respect to temporal trends, in the Atlantic Ocean I found preliminary evidence for a cyclic regularity in whale shark occurrence (Chapter V). In Chapter VI, I build a model to predict global whale shark habitat suitability for the present, as well as within a climate change scenario for 2070. Finally, Chapter VII provides a general discussion of the work developed within this thesis and presents ideas for future research.
Thesis (Ph.D.) -- University of Adelaide, School of Earth and Environmental Sciences, 2013

APA, Harvard, Vancouver, ISO, and other styles

33

Nepivodová, Linda. "Vlastními slovy studentů a podle výsledků estů: Smíšený výzkum porovnávající dva způsoby adminisrace testů." Doctoral thesis, 2018. http://www.nusl.cz/ntk/nusl-375561.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!