Dissertations / Theses on the topic 'Mixed data types'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 33 dissertations / theses for your research on the topic 'Mixed data types.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Koomson, Obed. "Performance Assessment of The Extended Gower Coefficient on Mixed Data with Varying Types of Functional Data." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/etd/3512.
Full textApitzsch, Cecilia, and Josefin Ryeng. "Cluster Analysis of Mixed Data Types in Credit Risk : A study of clustering algorithms to detect customer segments." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-172594.
Full textGIORDANI, ILARIA. "Relational clustering for knowledge discovery in life sciences." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2010. http://hdl.handle.net/10281/7830.
Full textAbouzeid, Shadi. "A visual interactive grouping analysis tool (VIGAT) that takes mixed data types as input and provides visually interactive overlapping groups as output." Thesis, University of Strathclyde, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.401309.
Full textSun, Jinhui. "Robust Feature Screening Procedures for Mixed Type of Data." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/73709.
Full textPh. D.
Engardt, Sara. "Unsupervised learning with mixed type data : for detecting money laundering." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230891.
Full textSyftet med denna masteruppsats är att utföra en klusteranalys på delar av Handelsbankens kunddatabas. Tanken är att undersöka ifall detta kan vara till hjälp i att identifiera typkunder inom olagliga aktiviteter såsom penningtvätt. Först genomförs en litteraturstudie för att undersöka vilken algoritm som är bäst lämpad för att lösa problemet. Kunddatabasen består av data med både numeriska och kategoriska attribut. Ett utökat Kohonen-nätverk (eng: self-organising map) samt k-prototyp algoritmen används för klustringen. Resultaten visar att det finns kluster i datat, men i närvaro av brus. Mer arbete behöver göras för att hantera tomma värden bland attributen.
Codd, Casey. "A Review and Comparison of Models and Estimation Methods for Multivariate Longitudinal Data of Mixed Scale Type." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398686513.
Full textKoufakou, Anna. "SCALABLE AND EFFICIENT OUTLIER DETECTION IN LARGE DISTRIBUTED DATA SETS WITH MIXED-TYPE ATTRIBUTES." Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3431.
Full textPh.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering PhD
Chu, Shuyu. "Change Detection and Analysis of Data with Heterogeneous Structures." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78613.
Full textPh. D.
Wahi, Rabbani Rash-ha. "Towards an understanding of the factors associated with severe injuries to cyclists in crashes with motor vehicles." Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/121426/1/Rabbani%20Rash-Ha_Wahi_Thesis.pdf.
Full textChevallier, Juliette. "Statistical models and stochastic algorithms for the analysis of longitudinal Riemanian manifold valued data with multiple dynamic." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLX059/document.
Full textBeyond transversal studies, temporal evolution of phenomena is a field of growing interest. For the purpose of understanding a phenomenon, it appears more suitable to compare the evolution of its markers over time than to do so at a given stage. The follow-up of neurodegenerative disorders is carried out via the monitoring of cognitive scores over time. The same applies for chemotherapy monitoring: rather than tumors aspect or size, oncologists asses that a given treatment is efficient from the moment it results in a decrease of tumor volume. The study of longitudinal data is not restricted to medical applications and proves successful in various fields of application such as computer vision, automatic detection of facial emotions, social sciences, etc.Mixed effects models have proved their efficiency in the study of longitudinal data sets, especially for medical purposes. Recent works (Schiratti et al., 2015, 2017) allowed the study of complex data, such as anatomical data. The underlying idea is to model the temporal progression of a given phenomenon by continuous trajectories in a space of measurements, which is assumed to be a Riemannian manifold. Then, both a group-representative trajectory and inter-individual variability are estimated. However, these works assume an unidirectional dynamic and fail to encompass situations like multiple sclerosis or chemotherapy monitoring. Indeed, such diseases follow a chronic course, with phases of worsening, stabilization and improvement, inducing changes in the global dynamic.The thesis is devoted to the development of methodological tools and algorithms suited for the analysis of longitudinal data arising from phenomena that undergo multiple dynamics and to apply them to chemotherapy monitoring. We propose a nonlinear mixed effects model which allows to estimate a representative piecewise-geodesic trajectory of the global progression and together with spacial and temporal inter-individual variability. Particular attention is paid to estimation of the correlation between the different phases of the evolution. This model provides a generic and coherent framework for studying longitudinal manifold-valued data.Estimation is formulated as a well-defined maximum a posteriori problem which we prove to be consistent under mild assumptions. Numerically, due to the non-linearity of the proposed model, the estimation of the parameters is performed through a stochastic version of the EM algorithm, namely the Markov chain Monte-Carlo stochastic approximation EM (MCMC-SAEM). The convergence of the SAEM algorithm toward local maxima of the observed likelihood has been proved and its numerical efficiency has been demonstrated. However, despite appealing features, the limit position of this algorithm can strongly depend on its starting position. To cope with this issue, we propose a new version of the SAEM in which we do not sample from the exact distribution in the expectation phase of the procedure. We first prove the convergence of this algorithm toward local maxima of the observed likelihood. Then, with the thought of the simulated annealing, we propose an instantiation of this general procedure to favor convergence toward global maxima: the tempering-SAEM
Elamin, Obbey Ahmed. "Nonparametric kernel estimation methods for discrete conditional functions in econometrics." Thesis, University of Manchester, 2013. https://www.research.manchester.ac.uk/portal/en/theses/nonparametric-kernel-estimation-methods-for-discrete-conditional-functions-in-econometrics(d443e56a-dfb8-4f23-bfbe-ec98ecac030b).html.
Full textBushel, Pierre Robert. "Clustering of mixed data types with application to toxicogenomics." 2005. http://www.lib.ncsu.edu/theses/available/etd-03172005-091928/unrestricted/etd.pdf.
Full textHuang, Pei-Yuan, and 黃沛媛. "Fuzzy Clustering Algorithms for the Mixed Types of Data." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/12616104324570956371.
Full text中原大學
數學研究所
89
There are several methods for clustering of data, such as divisive , hierarchical, k-means, and fuzzy c-means methods, etc. However ,these methods are must used for numerical data. There are few documents dealing with mixed types of numerical , symbolic and fuzzy data. This thesis presents fuzzy clustering algorithms for the mixed type of data (i.e., composed of numerical, symbolic, and fuzzy data) by adopting fuzzy c-means (FCM) [4]. It is mainly based on the definition of symbolic distance proposed by Diday [3] and Gowda & Diday [5,6], and on also the definition of fuzzy distance proposed by Hathaway & Bezdek [8]. The fact that these two distances come against on intuition is found during the process. Therefore, an appropriate amendment is made, and better results are given. At last, a real example is given. The mixed type of data is included. By adopting the method proposed in this thesis, good results are generated. That is, the proposed method can be adopted to classify both well mixed type and individual type of data.
Hancock, Timothy Peter. "Multivariate consensus trees: tree-based clustering and profiling for mixed data types." Thesis, 2006. https://researchonline.jcu.edu.au/17497/1/01front.pdf.
Full textChing, Billy K. S. "Analysis of longitudinal data of mixed types using a state space model approach." Thesis, 1997. http://hdl.handle.net/2429/6389.
Full textWang, Jiang-Shan, and 王江山. "Improved Learning Vector Quantization for Mixed-Type Data." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/77276310988050953362.
Full text國立雲林科技大學
資訊管理系碩士班
99
With the rapid growth of electronic business, each enterprise has a large amount of electronic data, such as information of customers, information of transactions, etc. most of the data which owned by companies nowadays includes categorical data and numeric data. Learning Vector Quantization (LVQ) is a classification technique which can deal with a large amount of data. It is suitable to serve enterprises for data exploration. Traditional LVQ can’t directly handle categorical data, it requires conversion. A typical conversion is 1-of-k. However, after the conversion, categorical data can’t keep its original structure, which results in some classified errors. In this study, we propose an improved LVQ (ILVQ) which integrates distance hierarchy for handling mixed-type data. Experiments on synthetic and real-world datasets were conducted, and the results demonstrated the effectiveness of the ILVQ.
Jiang, Shun-Mao, and 姜順貿. "Calculation of Dissimilarity Matrix for Mixed-type Data." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/3tnd58.
Full text國立政治大學
統計學系
106
Clustering is a common method for data mining. It requires the information about the distance between observations. The way to define the distance becomes a big challenge due to the convenience of data collection. Datasets are in more complex structures, such as mixed-type. Two types of problems have arisen: how to measure the distances between categorical variables and how to measure the distances for mixed variables. The current study proposed an algorithm to define the distance of categorical variables by the ability of distinguishing other related variables. On the other hand, for continuous variables, first, variables were normalized and weighted Euclidean distances were calculated. Then, two distances we calculated above were combined to find a final distance. Hierarchical clustering was used to verify the performance of proposed method, through some real-world data compared with the methods of the previous paper. The experiments results showed that the proposed method was comparable with other methods. The overall average performance was the best. The technique can be applied to all types of the data. In addition, by visualizing the proposed distance matrix from the heat maps, we found that the number of cluster patterns were the same as the level of class in the majority of our examples.
Lin, Shu-Han, and 林書漢. "Apply Extended Self-Organizing Map to Analyze Mixed-Type Data." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/98921067198051589700.
Full text雲林科技大學
資訊管理系碩士班
98
Mixed numeric and categorical data are commonly seen in nowadays corporate databases in which precious patterns may be hidden. Analyzing mixed-type data to extract the hidden patterns valuable to decision-making is therefore beneficial and critical for corporations to remain competitive. In addition, visualization facilitates exploration in the early stage of data analysis. In the paper, we present a visualized approach for analyzing multivariate mixed-type data. The proposed framework based on an extended self-organizing map allows visualized data cluster analysis as well as classification. We demonstrate the feasibility of the approach by analyzing two real-world datasets and compare our approach with other existing models to show its advantages.
Kung, Chien-hao, and 龔建豪. "Apply Distance Learning with ierarchical Tree for Mixed-Type Data." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/66422541067077899584.
Full text國立雲林科技大學
資訊管理系碩士班
101
Data analysis is widely used in fields such as biometric, financial marketing, weather forecast, etc. Expert uses data analysis to extract hidden knowledge in their domain. In real world, data are usually of mixed-type which consists of numerical values and categorical values. However, most of data analysis methods assume that all data are either numeric or categorical. Moreover, categorical data are hard to be handled because the values cannot be calculated directly. In this study, we aim to enhance the performance by improving the way of measuring similarity between categorical values in data analysis. First of all, we combine Co-occurrence between feature values and class (COFC), DIstance Learning for Categorical Attributes (DILCA) and Data-intensive similarity measure for categorical data (DISC) with distance hierarchy to turn categorical values into numerical data. Experiments on synthetic and real datasets are conducted, and results demonstrated effectiveness of our approach.
Lin, Zih-Hui, and 林姿慧. "Extending Structure Adaptive Self-Organizing Map for Clustering Mixed-type Data." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/03195500277699490423.
Full text國立雲林科技大學
資訊管理系碩士班
95
The self-organizing map is an unsupervised neural network which can project high-dimensional data into two-dimensional map. However, the traditional SOM fixes the structure of the map and can not dynamically expand neurons. When used in classification, the traditional SOM has low accuracy. In addition, the traditional SOM can not appropriately deal with categorical data. The extending structure adaptive self-organizing map (ESASOM) integrates structure adaptive self-organizing map and distance hierarchy so as to possess the ability of both dynamic splitting for improving classification performance and dealing with the categorical data. In order to show the cluster structure among neurons on the trained ESASOM, the similar neurons need to be grouped. In this paper, we propose a scheme for clustering the trained ESASOM which not only can be applied to classification but also reflect the cluster structure of the data. Experimental results demonstrate that the proposed method can help users to identify a set of possible clusterings of the training dataset such that users can choose a clustering from them according to his preference.
ROCCO, GIORGIA. "Multilevel mixed-type data analysis for validating partitions of scrapie isolates." Doctoral thesis, 2017. http://hdl.handle.net/11573/1095347.
Full textLu, Yu-Ting, and 呂郁婷. "Apply Dimensionality Reduction Technique and Distance Learning for Mixed-type Data Visualization." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/91496866094353828247.
Full text國立雲林科技大學
資訊管理系
103
Data with mixed-type of attributes are common in real-life data mining applications. However, most traditional clustering algorithms are limited to handling data that contain either only numeric or categorical attributes. Moreover, in various domains, dimensionality reduction are important for data analysis and visualization, which transforms high-dimensional data into a meaningful representation of reduced dimensionality, typically a two-dimensional space. In recent years, some algorithms of distance learning which can effectively handle mixed-type data has been proposed. In this work, we propose an approach of dimensionality reduction integrated with distance learning and aim to examine whether performance by processing categorical values with distance learning is better than original dimensionality reduction methods. Experimental results indicate the proposed approach outperforms the traditional method.
Chuang, Kai-Ting, and 莊凱婷. "Apply BatchGSOM Approach to Improve the Performance for Mixed-type Data Analysis." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/86597505061914820368.
Full text國立雲林科技大學
資訊管理系
103
Analyzing big data to find out valuable and useful information or knowledge in the data is one of the most concerned topics nowadays. Real-world data are complicated that usually consist of different types of attributes such as numeric and categorical attributes. Analyzing mixed-type data is not straightforward. Generalizing Self Organizing Map (GSOM) is an effective tool for the visualization of high-dimensional data, and this model can handle this problem. GSOM calculates the distance between categorical values via distance hierarchy. But the training process of GSOM is like Self Organizing Map (SOM). In the weight updating phase, neurons are updated by one input instance at a time. The instance-by-instance update of weights is time-consuming. In this study, we propose integrating the training process of the batch update algorithm with the GSOM model called BatchGSOM. BatchGSOM model runs faster than the stepwise update algorithm. Experiments on synthetic and real-world datasets were conducted, and we compared performances of BatchGSOM with GSOM. The results demonstrated the effectiveness of the BatchGSOM model. The BatchGSOM process improved the performances for mixed-type dataset.
Cheng, Fu-Chou, and 鄭富州. "Mixed-type data clustering approach to cell formation in Cellular Manufacturing System." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/kqz8fr.
Full text中原大學
應用數學研究所
91
Abstract Group Technology (GT) is a strategy in management. It affects a company on most areas. Its impact on productivity is so important that we can not underestimate it. It is also a manufacturing philosophy in improving the productivity for a manufacturing systems. To implement a GT system successfully, one has to understand its impact on the system performance, the different department functioning and the technologies that can assist the implementation. If it is used well, it can lead to economic benefits and job satisfaction. Cell formation (CF), one of the most important problems faced in designing cellular manufacturing systems, is to involve identifying families of similar parts. A part family is a group of parts presenting similar geometry and requiring a similar production process. Traditional schemes such as the classification on coding and the production flow analysis do not consider uncertainty, symbolic or fuzzy data. In this study, we use a mixed-type data clustering algorithm to cell formation. Some examples are demonstrated by applying the proposed method to the real data.
Tai, Wei-Shen, and 戴偉勝. "A Growing Self-Organizing Map for Visualization of Multivariate Mixed-Type Data." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/55926465515801425185.
Full text國立雲林科技大學
資訊管理系博士班
100
Nowadays, abundant multivariate mixed-type data including numeric as well as categorical attributes are ubiquitous in a variety of applications. Therefore, processing and analyzing such mixed-type data has become an important issue in data mining field. Via visualization models, one is able to understand and analyze those relationships between complicated data more effortlessly. Self-Organizing Map (SOM) possesses an effective visualization capability for presenting the characteristics of high-dimensional data on a low-dimensional map. One can efficiently extract valuable information from a large amount of data by means of SOM mapping results. More recently, multitudinous variants of SOM were devised to improve deficiencies occurred in conventional SOMs such as fixed-size map, topological preservation and mixed-type data. To overcome the constraint of fixed-size map, diverse flexible map structures were proposed in many growing SOMs. On the other hand, a varied update function in which the distances of map-space between data were considered was used to enhance topological preservation of projected resultants on a predetermined map. Nevertheless, none of current models offers a plausible solution to solve the foregoing problems simultaneously. In this study, a Growing Mixed-type SOM (GMixSOM) is proposed to overcome the abovementioned deficiencies by integrating a new dynamic structure scheme, visualization-induced update and distance hierarchy in one model. Experimental results demonstrated that the proposed model is a feasible solution to manipulate multivariate mixed-type data and reflect the data-space distance between data on a map with a flexible structure.
Huang, Wei-Hao, and 黃韋皓. "Apply Distance Hierarchy and Dimensionality Reduction to Classification of Mixed-Type Data." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/03834995704641262756.
Full text國立雲林科技大學
資訊管理系碩士班
100
An integrated dimensionality reduction technique with distance hierarchy which can handle mixed-typed data, reduce dimensionality of the data, and visualize data on a 2D map is proposed. There are two aspects of the integration. First, distance hierarchy (DH) is applied to handle categorical values which are mapped to the DH. In contrast to 1-of-k coding, DH considers semantics inherent in categorical values and therefore topological order in the data can be better preserved. Second, t-SNE is employed to reduce data dimensionality which transforms the data in a high-dimensional space to a low-dimensional space. t-SNE is better than other counterparts in separating classes in the lower dimensional space. We use weighted K-NN to evaluate classification performance of using DH and using 1-of-k coding in the original data space and in the projection space. We demonstrate the superiority of using DH against 1-of-k coding by analyzing two synthetic datasets and five real-world datasets.
WU, JHEN-WEI, and 吳振維. "Apply Dimensionality Reduction Techniques for High Dimensional Mixed-type Data Visualization and Analysis." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/42rbtw.
Full text國立雲林科技大學
資訊管理系
104
Visualization is a useful technique in data analysis, especially, in the initial stage, data exploration. Since high-dimensional data is not visible, dimensionality reduction techniques are usually used to reduce the data to a lower dimension, say two, for visualization. In previous studies, dimensionality reduction was investigated in the context of numeric datasets. Nevertheless, most of real-world datasets are of mixed-type containing both numeric and categorical attributes. In this case, a conventional approach could neither handle it directly nor output an expected result. In this study, we propose a framework which applies dimensionality reduction with distance learning for high-dimensional mixed-type datasets. We also present a method to compare quality of projection results yielded by different distance learning algorithms. Finally, we propose an approach to extract significant features and visualize patterns from the projection map chosen according to quality measures. Experiments on real-world datasets were conducted to demonstrate feasibility of the proposed approach.
Wang, Jiali. "Recent developments of copula-based models to handle missing data of mixed-type in multivariate analysis." Phd thesis, 2018. http://hdl.handle.net/1885/163716.
Full textKaraganis, Milana. "Small Area Estimation for Survey Data: A Hierarchical Bayes Approach." 2009. http://hdl.handle.net/1993/3207.
Full textKoh, Kim Hong. "Type I error rates for multi-group confirmatory maximum likelihood factor analysis with ordinal and mixed item format data : a methodology for construct comparability." Thesis, 2003. http://hdl.handle.net/2429/15975.
Full textEducation, Faculty of
Educational and Counselling Psychology, and Special Education (ECPS), Department of
Graduate
Martins, Sequeira Ana Micaela. "Global distribution models for whale sharks : assessing occurrence trends of highly migratory marine species." Thesis, 2013. http://hdl.handle.net/2440/81551.
Full textThesis (Ph.D.) -- University of Adelaide, School of Earth and Environmental Sciences, 2013
Nepivodová, Linda. "Vlastními slovy studentů a podle výsledků estů: Smíšený výzkum porovnávající dva způsoby adminisrace testů." Doctoral thesis, 2018. http://www.nusl.cz/ntk/nusl-375561.
Full text