Dissertations / Theses on the topic 'Pre-categorical and post-categorical selection'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 25 dissertations / theses for your research on the topic 'Pre-categorical and post-categorical selection.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Stemp, Iain Charles. "Bayesian model selection ideas for categorical data." Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.308335.
Full textZahid, Faisal Maqbool [Verfasser]. "Regularization and Variable Selection in Categorical Regression Analyses / Faisal Maqbool Zahid." München : Verlag Dr. Hut, 2011. http://d-nb.info/1014848423/34.
Full textHjerpe, Adam. "Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185496.
Full textRandom Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.
Li, Junjie. "Some algorithmic studies in high-dimensional categorical data clustering and selection number of clusters." HKBU Institutional Repository, 2008. http://repository.hkbu.edu.hk/etd_ra/1011.
Full textGuo, Lei. "Bayesian Biclustering on Discrete Data: Variable Selection Methods." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11201.
Full textStatistics
Tam, Hak Ping. "Preliminary variable selection and data preparation strategies for configural frequency analysis and other categorical multivariate techniques /." The Ohio State University, 1992. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487779439845611.
Full textLøvlie, Hanne. "Pre- and post-copulatory sexual selection in the fowl, Gallus gallus." Doctoral thesis, Stockholm : Department of Zoology, Stockholm University, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-6865.
Full textDemary, Kristian C. "Connecting pre- and post-mating episodes of sexual selection in Photinus greeni fireflies /." Thesis, Connect to Dissertations & Theses @ Tufts University, 2005.
Find full textAdviser: Sara M. Lewis. Submitted to the Dept. of Biology. Includes bibliographical references. Access restricted to members of the Tufts University community. Also available via the World Wide Web;
Dougherty, Liam R. "Pre- and post-copulatory sexual selection in two species of lygaeid seed bug." Thesis, University of St Andrews, 2015. http://hdl.handle.net/10023/7246.
Full textTrillo, Paula Alejandra. "Pre- and post-copulatory sexual selection in the tortoise beetle Acromis Sparsa (Coleoptera Chrysomelidae)." [Missoula, Mont.] : The University of Montana, 2008. http://etd.lib.umt.edu/theses/available/etd-03212009-144120/unrestricted/Trillo_umt_0136D_10003.pdf.
Full textDensley, Landon T. "Hiring Practices for Graphic Designers In Utah County, Utah." Diss., CLICK HERE for online access, 2004. http://contentdm.lib.byu.edu/ETD/image/etd489.pdf.
Full textLouw, Nelmarie. "Aspects of the pre- and post-selection classification performance of discriminant analysis and logistic regression." Thesis, Stellenbosch : Stellenbosch University, 1997. http://hdl.handle.net/10019.1/55402.
Full textOne copy microfiche.
ENGLISH ABSTRACT: Discriminani analysis and logistic regression are techniques that can be used to classify entities of unknown origin into one of a number of groups. However, the underlying models and assumptions for application of the two techniques differ. In this study, the two techniques are compared with respect to classification of entities. Firstly, the two techniques were compared in situations where no data dependent variable selection took place. Several underlying distributions were studied: the normal distribution, the double exponential distribution and the lognormal distribution. The number of variables, sample sizes from the different groups and the correlation structure between the variables were varied to' obtain a large number of different configurations. .The cases of two and three groups were studied. The most important conclusions are: "for normal and double' exponential data linear discriminant analysis outperforms logistic regression, especially in cases where the ratio of the number of variables to the total sample size is large. For lognormal data, logistic regression should be preferred, except in cases where the ratio of the number of variables to the total sample size is large. " Variable selection is frequently the first step in statistical analyses. A large number of potenti8.Ily important variables are observed, and an optimal subset has to be selected for use in further analyses. Despite the fact that variable selection is often used, the influence of a selection step on further analyses of the same data, is often completely ignored. An important aim of this study was to develop new selection techniques for use in discriminant analysis and logistic regression. New estimators of the postselection error rate were also developed. A new selection technique, cross model validation (CMV) that can be applied both in discriminant analysis and logistic regression, was developed. ."This technique combines the selection of variables and the estimation of the post-selection error rate. It provides a method to determine the optimal model dimension, to select the variables for the final model and to estimate the post-selection error rate of the discriminant rule. An extensive Monte Carlo simulation study comparing the CMV technique to existing procedures in the literature, was undertaken. In general, this technique outperformed the other methods, especially with respect to the accuracy of estimating the post-selection error rate. Finally, pre-test type variable selection was considered. A pre-test estimation procedure was adapted for use as selection technique in linear discriminant analysis. In a simulation study, this technique was compared to CMV, and was found to perform well, especially with respect to correct selection. However, this technique is only valid for uncorrelated normal variables, and its applicability is therefore limited. A numerically intensive approach was used throughout the study, since the problems that were investigated are not amenable to an analytical approach.
AFRIKAANSE OPSOMMING: Lineere diskriminantanaliseen logistiese regressie is tegnieke wat gebruik kan word vir die Idassifikasie van items van onbekende oorsprong in een van 'n aantal groepe. Die agterliggende modelle en aannames vir die gebruik van die twee tegnieke is egter verskillend. In die studie is die twee tegnieke vergelyk ten opsigte van k1assifikasievan items. Eerstens is die twee tegnieke vergelyk in 'n apset waar daar geen data-afhanklike seleksie van veranderlikes plaasvind me. Verskeie onderliggende verdelings is bestudeer: die normaalverdeling, die dubbeleksponensiaal-verdeling,en die lognormaal verdeling. Die aantal veranderlikes, steekproefgroottes uit die onderskeie groepe en die korrelasiestruktuur tussen die veranderlikes is gevarieer om 'n groot aantal konfigurasies te verkry. Die geval van twee en drie groepe is bestudeer. Die belangrikste gevolgtrekkings wat op grond van die studie gemaak kan word is: vir normaal en dubbeleksponensiaal data vaar lineere diskriminantanalise beter as logistiese regressie, veral in gevalle waar die. verhouding van die aantal veranderlikes tot die totale steekproefgrootte groot is. In die geval van data uit 'n lognormaalverdeling, hehoort logistiese regressie die metode van keuse te wees, tensy die verhouding van die aantal veranderlikes tot die totale steekproefgrootte groot is. Veranderlike seleksie is dikwels die eerste stap in statistiese ontledings. 'n Groot aantal potensieel belangrike veranderlikes word waargeneem, en 'n subversamelingwat optimaal is, word gekies om in die verdere ontledings te gebruik. Ten spyte van die feit dat veranderlike seleksie dikwels gebruik word, word die invloed wat 'n seleksie-stap op verdere ontledings van dieselfde data. het, dikwels heeltemal geYgnoreer.'n Belangrike doelwit van die studie was om nuwe seleksietegniekete ontwikkel wat gebruik kan word in diskriminantanalise en logistiese regressie. Verder is ook aandag gegee aan ontwikkeling van beramers van die foutkoers van 'n diskriminantfunksie wat met geselekteerde veranderlikes gevorm word. 'n Nuwe seleksietegniek, kruis-model validasie (KMV) wat gebruik kan word vir die seleksie van veranderlikes in beide diskriminantanalise en logistiese regressie is ontwikkel. Hierdie tegniek hanteer die seleksie van veranderlikes en die beraming van die na-seleksie foutkoers in een stap, en verskaf 'n metode om die optimale modeldimensiete bepaal, die veranderlikes wat in die model bevat moet word te kies, en ook die na-seleksie foutkoers van die diskriminantfunksie te beraam. 'n Uitgebreide simulasiestudie waarin die voorgestelde KMV-tegniek met ander prosedures in die Iiteratuur. vergelyk is, is vir beide diskriminantanaliseen logistiese regressie ondemeem. In die algemeen het hierdie tegniek beter gevaar as die ander metodes wat beskou is, veral ten opsigte van die akkuraatheid waarmee die na-seleksie foutkoers beraam word. Ten slotte is daar ook aandag gegee aan voor-toets tipeseleksie. 'n Tegniek is ontwikkel wat gebruik maak van 'nvoor-toets berarningsmetode om veranderlikes vir insluiting in 'n lineere diskriminantfunksie te selekteer. Die tegniek ISin 'n simulasiestudie met die KMV-tegniek vergelyk, en vaar baie goed, veral t.o.v. korrekte seleksie. Hierdie tegniek is egter slegs geldig vir ongekorreleerde normaalveranderlikes, wat die gebruik darvan beperk. 'n Numeries intensiewe benadering is deurgaans in die studie gebruik. Dit is genoodsaak deur die feit dat die probleme wat ondersoek is, nie deur middel van 'n analitiese benadering hanteer kan word nie.
Yuan, Qingcong. "INFORMATIONAL INDEX AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA." UKnowledge, 2017. http://uknowledge.uky.edu/statistics_etds/28.
Full textKe, Chenlu. "A NEW INDEPENDENCE MEASURE AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA ANALYSIS." UKnowledge, 2019. https://uknowledge.uky.edu/statistics_etds/41.
Full textDuchateau, Fabien. "Towards a Generic Approach for Schema Matcher Selection : Leveraging User Pre- and Post-match Effort for Improving Quality and Time Performance." Montpellier 2, 2009. http://www.theses.fr/2009MON20213.
Full textInteroperability between applications or bridges between data sources are required to allow optimal information exchanges. Yet, some processes needed to bring this integra- tion cannot be fully automatized due to their complexity. One of these processes is called matching and it has now been studied for years. It aims at discovering semantic corre- spondences between data sources elements and is still largely performed manually. Thus, deploying large data sharing systems requires the (semi-)automatization of this matching process. Many schema matching tools were designed to discover mappings between schemas. However, some of these tools intend to fulfill matching tasks with specific criteria, like a large scale scenario or the discovery of complex mappings. And contrary to ontology alignment research field, there is no common platform to evaluate them. The abundance of schema matching tools, added to the two previously mentioned issues, does not facil- itate the choice, by an user, of the most appropriate tool to match a given scenario. In this dissertation, our first contribution deals with a benchmark, XBenchMatch, to evaluate schema matching tools. It consists of several schema matching scenarios, which features one or more criteria. Besides, we have designed new measures to evaluate the quality of integrated schemas and the user post-match effort. This study and analysis of existing matching tools enables a better understanding of the matching process. Without external resources, most matching tools are mainly not able to detect a mapping between elements with totally dissimilar labels. On the contrary, they cannot infirm a mapping between elements with similar labels. Our second contribu- tion, BMatch, is a matching tool which includes a structural similarity measure and it aims at solving these issues by only using the schema structure. Terminological measures en- able the discovery of mappings whose schema elements share similar labels. Conversely, structural measures, based on cosine measure, detects mappings when schema elements have the same neighbourhood. BMatch's second aspect aims at improving the time per- formance by using an indexing structure, the B-tree, to accelerate the schema matching process. We empirically demonstrate the benefits and the limits of our approach. Like most schema matching tools, BMatch uses an aggregation function to combine similarity values, thus implying several drawbacks in terms of quality and performance. Tuning the parameters is another burden for the user. To tackle these issues, MatchPlanner introduces a new method to combine similarity measures by relying on decision trees. As decision trees can be learned, parameters are automatically tuned and similarity measures are only computed when necessary. We show that our approach provides an increase in terms of matching quality and better time performance with regards to other matching tools. We also present the possibility to let users choose a preference between precision and recall. Even with tuning capabilities, schema matching tools are still not generic enough to provide acceptable quality results for most schema matching scenarios. We finally extend MatchPlanner by proposing a factory of schema matchers, named YAM (for Yet Another Matcher). This tool brings more flexibility since it generates an 'a la carte' matcher for a given schema matching scenario. Indeed, schema matchers can be seen as machine learn- ing classifiers since they classify pairs of schema elements either as relevant or irrelevant. Thus, the best matcher in terms of matching quality is built and selected from a set of different classifiers. We also show impact on the quality when user provides some inputs, namely a list of expert mappings and a preference between precision and recall
Powell, Nina Laurel. "Reasoning and processing of behavioural and contextual information : influences on pre-judgement reasoning, post-judgement information selection and engagement, and moral behaviour." Thesis, University of Birmingham, 2013. http://etheses.bham.ac.uk//id/eprint/4252/.
Full textPoleto, Frederico Zanqueta. "Análise de dados categorizados com omissão." Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-04122007-192457/.
Full textWe consider theoretical, computational and applied aspects of classical categorical data analyses with missingness. We present a literature review while introducing the missingness mechanisms, highlighting their characteristics and implications in the inferences of interest by means of an example involving two binary responses and simulation studies. We extend the multinomial modeling scenario described in Paulino (1991, Brazilian Journal of Probability and Statistics 5, 1-42) to the product-multinomial setup to allow for the inclusion of explanatory variables. We develop the results in matrix formulation and implement the computational procedures via subroutines written under R statistical environment. We illustrate the application of the theory by means of five examples with different characteristics, fitting structural linear (marginal homogeneity), log-linear (independence, constant adjacent odds ratio) and functional linear models (kappa, weighted kappa, sensitivity/specificity, positive/negative predictive value) for the marginal probabilities. The missingness patterns includes missingness in one or two variables, neighbor cells confounded, with or without explanatory variables.
Haouas, Nabiha. "Wind energy analysis and change point analysis." Thesis, Clermont-Ferrand 2, 2015. http://www.theses.fr/2015CLF22554.
Full textThe wind energy, one of the most competitive renewable energies, is considered as a solution which remedies the inconveniences of the fossil energy. For a better management and an exploitation of this energy, forecasts of its production turn out to be necessary. The methods of forecasts used in the literature allow only a forecast of the annual mean of this production. Certain recent works propose the use of the Central Limit Theorem (CLT), under not classic hypotheses, for the estimation of the mean annual production of the wind energy as well as its variance for a single turbine. We propose in this thesis, an extension of these works in a wind farm by relaxation of the hypothesis of stationarity the wind speed and the power production, supposing that the latter are seasonal. Under this hypothesis the quality of the annual forecast improves considerably. We also suggest planning the wind power production during four seasons of the year. The use of the fractal model, allows us to find a "natural" division of the series of the wind speed to refine the estimation of the wind production by detecting abrupt change points. Statistical tools of the change points detection and the estimation of fractal models are presented in the last two chapters
LI, SHAO-PENG, and 李少芃. "Categorical Variable Selection and Level Clustering in Count Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/umr3yz.
Full textLiu, Chen-Ying, and 劉振熒. "A Model Selection Technique between Two Empirical Bayes Models for Categorical Data." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/98221355733341962404.
Full text國立交通大學
統計學研究所
93
In the paper, first of all, a model selection technique between two empirical Bayes models for categorical data in manufacturing is proposed. Next, two useful empirical Bayes models for categorical data in manufacturing are introduced. Finally, the performance of the proposed method is illustrated by an example through simulations.
"Pre- and post-copulatory sexual selection in the tortoise beetle Acromis sparsa (Coleoptera: Chrysomelidae)." UNIVERSITY OF MONTANA, 2009. http://pqdtopen.proquest.com/#viewpdf?dispub=3338789.
Full textLi, Yu-Ching, and 李俞青. "Optimal Selection of Indicators and Portfolio by Genetic Algorithms pre- and post- the Financial Crisis." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/47e33d.
Full text亞洲大學
財務金融學系碩士在職專班
103
Based on a literature review and the multiple indicators of fundamentals and chips, this paper employs the genetic algorithm (GA), screens the best analysis indicators and threshold values and determines the most suitable investment portfolio. The data from 2007 to 2009, when the Financial Crisis occurred, are sourced, with the annual returns of individual shares as the target. Using the pointer of the Sortino ratio as the basis, the average return of selected blue chips is calculated. Empirical results show that the return of pre-crisis GA is the largest, followed by that of the index, and whiles the lowest one is the traditional methods. The return of middle-crisis GA is the largest, followed by that of the traditional methods, and the lowest one is the index. The return of post-crisis GA is the largest, followed by that of traditional methods, and the lowest one is the index. As for the best analysis indicators selected with GA, the best indicator for the pre-crisis GA is free cash flow, free cash flow and earnings per share for middle-crisis GA, and free cash flow and securities-cash ratio for post-crisis GA. Regardless of pre-crisis, middle-crisis or post-crisis GA, GA has better returns than the traditional methods and the index. Moreover, traditional methods have better returns than the index in both middle-crisis or post-crisis periods, suggesting that the financial crisis would affect the return of investment portfolio. The free cash flow is the best analysis indicator during pre-crisis, middle-crisis or post-crisis periods, which worth deserves attention from investors.
MacEachern, Kathryn Anne. "A Comparison of Categorical vs. Fractional Parental Allocation Based on Microsatellite Markers to Estimate Reproductive Success and Inbreeding Levels Over Three Generations of Selective Breeding in a Closed Population of Rainbow Trout (Oncorhynchus mykiss)." 2011. http://hdl.handle.net/10222/14356.
Full textMSc Thesis
Silvestre, Cláudia Marisa Vasconcelos. "Clustering with discrete mixture models: An integrated approach for model selection." Doctoral thesis, 2014. http://hdl.handle.net/10071/9991.
Full textResearch on cluster analysis continues to develop. Identifying the number of clusters and selecting a subset of relevant variables available in the data have been active areas in research on clustering methods. The approaches proposed for addressing these issues are mostly designed to deal with numerical data and cannot be directly applied for clustering categorical data. This work intends to be a contribution to handling categorical data, in this area.
Ranjineh, Khojasteh Enayatollah. "Geostatistical three-dimensional modeling of the subsurface unconsolidated materials in the Göttingen area." Doctoral thesis, 2013. http://hdl.handle.net/11858/00-1735-0000-0001-BB9A-B.
Full text