Dissertations / Theses: 'Pre-categorical and post-categorical selection'

1

Stemp, Iain Charles. "Bayesian model selection ideas for categorical data." Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.308335.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Zahid, Faisal Maqbool [Verfasser]. "Regularization and Variable Selection in Categorical Regression Analyses / Faisal Maqbool Zahid." München : Verlag Dr. Hut, 2011. http://d-nb.info/1014848423/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Hjerpe, Adam. "Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185496.

Full text

Abstract:

The Random Forest model is commonly used as a predictor function and the model have been proven useful in a variety of applications. Their popularity stems from the combination of providing high prediction accuracy, their ability to model high dimensional complex data, and their applicability under predictor correlations. This report investigates the random forest variable importance measure (VIM) as a means to find a ranking of important variables. The robustness of the VIM under imputation of categorical noise, and the capability to differentiate informative predictors from non-informative variables is investigated. The selection of variables may improve robustness of the predictor, improve the prediction accuracy, reduce computational time, and may serve as a exploratory data analysis tool. In addition the partial dependency plot obtained from the random forest model is examined as a means to find underlying relations in a non-linear simulation study.
Random Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.

APA, Harvard, Vancouver, ISO, and other styles

4

Li, Junjie. "Some algorithmic studies in high-dimensional categorical data clustering and selection number of clusters." HKBU Institutional Repository, 2008. http://repository.hkbu.edu.hk/etd_ra/1011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Guo, Lei. "Bayesian Biclustering on Discrete Data: Variable Selection Methods." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11201.

Full text

Abstract:

Biclustering is a technique for clustering rows and columns of a data matrix simultaneously. Over the past few years, we have seen its applications in biology-related fields, as well as in many data mining projects. As opposed to classical clustering methods, biclustering groups objects that are similar only on a subset of variables. Many biclustering algorithms on continuous data have emerged over the last decade. In this dissertation, we will focus on two Bayesian biclustering algorithms we developed for discrete data, more specifically categorical data and ordinal data.
Statistics

APA, Harvard, Vancouver, ISO, and other styles

6

Tam, Hak Ping. "Preliminary variable selection and data preparation strategies for configural frequency analysis and other categorical multivariate techniques /." The Ohio State University, 1992. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487779439845611.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Løvlie, Hanne. "Pre- and post-copulatory sexual selection in the fowl, Gallus gallus." Doctoral thesis, Stockholm : Department of Zoology, Stockholm University, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-6865.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Demary, Kristian C. "Connecting pre- and post-mating episodes of sexual selection in Photinus greeni fireflies /." Thesis, Connect to Dissertations & Theses @ Tufts University, 2005.

Find full text

Abstract:

Thesis (Ph.D.)--Tufts University, 2005.
Adviser: Sara M. Lewis. Submitted to the Dept. of Biology. Includes bibliographical references. Access restricted to members of the Tufts University community. Also available via the World Wide Web;

APA, Harvard, Vancouver, ISO, and other styles

9

Dougherty, Liam R. "Pre- and post-copulatory sexual selection in two species of lygaeid seed bug." Thesis, University of St Andrews, 2015. http://hdl.handle.net/10023/7246.

Full text

Abstract:

Sexual selection arises via competition for access to mates, and is thus intimately tied to the social environment. For example, individual mating success may depend strongly on how many rivals or mating partners are available. Studies of mate choice and sexual selection may vary the number of mates a subject is presented with during mating experiments, yet it is not clear how this influences the strength and shape of sexual selection acting on traits in either sex. In this thesis I investigate the effect of social environment on sexual selection acting in two closely-related species of lygaeid seed bug: Lygaeus equestris and Lygaeus simulans. Males in both species possess an extremely elongate intromittent organ, which is over two-thirds average male body length. I show that the strength of pre-copulatory selection acting on male processus length in Lygaeus equestris and genital clasper shape in Lygaeus simulans is significantly influenced by the social context. However, selection on male and female body size in Lygaeus equestris is not. Additionally, I use a meta-analysis of 38 published studies to show that mating preferences are significantly stronger when more than one mate option is available, compared to when only a single option is available. I also investigate the functional morphology of male genital traits in Lygaeus simulans, and use formal selection analysis to quantify the strength of selection acting on these traits before, during and after mating. Finally, I use experimental manipulations in Lygaeus simulans to confirm that male processus length directly influences sperm transfer, and that intact genital claspers are required for successful intromission. Overall, my results illustrate that sexual selection in the wild may vary both spatially and temporally depending on the social environment. It is thus especially important that experiments are performed under ecologically relevant conditions.

APA, Harvard, Vancouver, ISO, and other styles

10

Trillo, Paula Alejandra. "Pre- and post-copulatory sexual selection in the tortoise beetle Acromis Sparsa (Coleoptera Chrysomelidae)." [Missoula, Mont.] : The University of Montana, 2008. http://etd.lib.umt.edu/theses/available/etd-03212009-144120/unrestricted/Trillo_umt_0136D_10003.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Densley, Landon T. "Hiring Practices for Graphic Designers In Utah County, Utah." Diss., CLICK HERE for online access, 2004. http://contentdm.lib.byu.edu/ETD/image/etd489.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Louw, Nelmarie. "Aspects of the pre- and post-selection classification performance of discriminant analysis and logistic regression." Thesis, Stellenbosch : Stellenbosch University, 1997. http://hdl.handle.net/10019.1/55402.

Full text

Abstract:

Thesis (PhD)--Stellenbosch University, 1997.
One copy microfiche.
ENGLISH ABSTRACT: Discriminani analysis and logistic regression are techniques that can be used to classify entities of unknown origin into one of a number of groups. However, the underlying models and assumptions for application of the two techniques differ. In this study, the two techniques are compared with respect to classification of entities. Firstly, the two techniques were compared in situations where no data dependent variable selection took place. Several underlying distributions were studied: the normal distribution, the double exponential distribution and the lognormal distribution. The number of variables, sample sizes from the different groups and the correlation structure between the variables were varied to' obtain a large number of different configurations. .The cases of two and three groups were studied. The most important conclusions are: "for normal and double' exponential data linear discriminant analysis outperforms logistic regression, especially in cases where the ratio of the number of variables to the total sample size is large. For lognormal data, logistic regression should be preferred, except in cases where the ratio of the number of variables to the total sample size is large. " Variable selection is frequently the first step in statistical analyses. A large number of potenti8.Ily important variables are observed, and an optimal subset has to be selected for use in further analyses. Despite the fact that variable selection is often used, the influence of a selection step on further analyses of the same data, is often completely ignored. An important aim of this study was to develop new selection techniques for use in discriminant analysis and logistic regression. New estimators of the postselection error rate were also developed. A new selection technique, cross model validation (CMV) that can be applied both in discriminant analysis and logistic regression, was developed. ."This technique combines the selection of variables and the estimation of the post-selection error rate. It provides a method to determine the optimal model dimension, to select the variables for the final model and to estimate the post-selection error rate of the discriminant rule. An extensive Monte Carlo simulation study comparing the CMV technique to existing procedures in the literature, was undertaken. In general, this technique outperformed the other methods, especially with respect to the accuracy of estimating the post-selection error rate. Finally, pre-test type variable selection was considered. A pre-test estimation procedure was adapted for use as selection technique in linear discriminant analysis. In a simulation study, this technique was compared to CMV, and was found to perform well, especially with respect to correct selection. However, this technique is only valid for uncorrelated normal variables, and its applicability is therefore limited. A numerically intensive approach was used throughout the study, since the problems that were investigated are not amenable to an analytical approach.
AFRIKAANSE OPSOMMING: Lineere diskriminantanaliseen logistiese regressie is tegnieke wat gebruik kan word vir die Idassifikasie van items van onbekende oorsprong in een van 'n aantal groepe. Die agterliggende modelle en aannames vir die gebruik van die twee tegnieke is egter verskillend. In die studie is die twee tegnieke vergelyk ten opsigte van k1assifikasievan items. Eerstens is die twee tegnieke vergelyk in 'n apset waar daar geen data-afhanklike seleksie van veranderlikes plaasvind me. Verskeie onderliggende verdelings is bestudeer: die normaalverdeling, die dubbeleksponensiaal-verdeling,en die lognormaal verdeling. Die aantal veranderlikes, steekproefgroottes uit die onderskeie groepe en die korrelasiestruktuur tussen die veranderlikes is gevarieer om 'n groot aantal konfigurasies te verkry. Die geval van twee en drie groepe is bestudeer. Die belangrikste gevolgtrekkings wat op grond van die studie gemaak kan word is: vir normaal en dubbeleksponensiaal data vaar lineere diskriminantanalise beter as logistiese regressie, veral in gevalle waar die. verhouding van die aantal veranderlikes tot die totale steekproefgrootte groot is. In die geval van data uit 'n lognormaalverdeling, hehoort logistiese regressie die metode van keuse te wees, tensy die verhouding van die aantal veranderlikes tot die totale steekproefgrootte groot is. Veranderlike seleksie is dikwels die eerste stap in statistiese ontledings. 'n Groot aantal potensieel belangrike veranderlikes word waargeneem, en 'n subversamelingwat optimaal is, word gekies om in die verdere ontledings te gebruik. Ten spyte van die feit dat veranderlike seleksie dikwels gebruik word, word die invloed wat 'n seleksie-stap op verdere ontledings van dieselfde data. het, dikwels heeltemal geYgnoreer.'n Belangrike doelwit van die studie was om nuwe seleksietegniekete ontwikkel wat gebruik kan word in diskriminantanalise en logistiese regressie. Verder is ook aandag gegee aan ontwikkeling van beramers van die foutkoers van 'n diskriminantfunksie wat met geselekteerde veranderlikes gevorm word. 'n Nuwe seleksietegniek, kruis-model validasie (KMV) wat gebruik kan word vir die seleksie van veranderlikes in beide diskriminantanalise en logistiese regressie is ontwikkel. Hierdie tegniek hanteer die seleksie van veranderlikes en die beraming van die na-seleksie foutkoers in een stap, en verskaf 'n metode om die optimale modeldimensiete bepaal, die veranderlikes wat in die model bevat moet word te kies, en ook die na-seleksie foutkoers van die diskriminantfunksie te beraam. 'n Uitgebreide simulasiestudie waarin die voorgestelde KMV-tegniek met ander prosedures in die Iiteratuur. vergelyk is, is vir beide diskriminantanaliseen logistiese regressie ondemeem. In die algemeen het hierdie tegniek beter gevaar as die ander metodes wat beskou is, veral ten opsigte van die akkuraatheid waarmee die na-seleksie foutkoers beraam word. Ten slotte is daar ook aandag gegee aan voor-toets tipeseleksie. 'n Tegniek is ontwikkel wat gebruik maak van 'nvoor-toets berarningsmetode om veranderlikes vir insluiting in 'n lineere diskriminantfunksie te selekteer. Die tegniek ISin 'n simulasiestudie met die KMV-tegniek vergelyk, en vaar baie goed, veral t.o.v. korrekte seleksie. Hierdie tegniek is egter slegs geldig vir ongekorreleerde normaalveranderlikes, wat die gebruik darvan beperk. 'n Numeries intensiewe benadering is deurgaans in die studie gebruik. Dit is genoodsaak deur die feit dat die probleme wat ondersoek is, nie deur middel van 'n analitiese benadering hanteer kan word nie.

APA, Harvard, Vancouver, ISO, and other styles

13

Yuan, Qingcong. "INFORMATIONAL INDEX AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA." UKnowledge, 2017. http://uknowledge.uky.edu/statistics_etds/28.

Full text

Abstract:

We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented. We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses on categorical response. Our approach always improves other typical screening approaches which only use marginal relation. Numerical studies are provided to demonstrate the advantages of the method. We introduce a novel approach to sufficient dimension reduction problems using the new measure. The proposed method requires very mild conditions on the predictors, estimates the central subspace effectively and is especially useful when response is categorical. It keeps the model-free advantage without estimating link function. Under regularity conditions, root-n consistency and asymptotic normality are established. The proposed method is very competitive and robust comparing to existing dimension reduction methods through simulations results.

APA, Harvard, Vancouver, ISO, and other styles

14

Ke, Chenlu. "A NEW INDEPENDENCE MEASURE AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA ANALYSIS." UKnowledge, 2019. https://uknowledge.uky.edu/statistics_etds/41.

Full text

Abstract:

This dissertation has three consecutive topics. First, we propose a novel class of independence measures for testing independence between two random vectors based on the discrepancy between the conditional and the marginal characteristic functions. If one of the variables is categorical, our asymmetric index extends the typical ANOVA to a kernel ANOVA that can test a more general hypothesis of equal distributions among groups. The index is also applicable when both variables are continuous. Second, we develop a sufficient variable selection procedure based on the new measure in a large p small n setting. Our approach incorporates marginal information between each predictor and the response as well as joint information among predictors. As a result, our method is more capable of selecting all truly active variables than marginal selection methods. Furthermore, our procedure can handle both continuous and discrete responses with mixed-type predictors. We establish the sure screening property of the proposed approach under mild conditions. Third, we focus on a model-free sufficient dimension reduction approach using the new measure. Our method does not require strong assumptions on predictors and responses. An algorithm is developed to find dimension reduction directions using sequential quadratic programming. We illustrate the advantages of our new measure and its two applications in high dimensional data analysis by numerical studies across a variety of settings.

APA, Harvard, Vancouver, ISO, and other styles

15

Duchateau, Fabien. "Towards a Generic Approach for Schema Matcher Selection : Leveraging User Pre- and Post-match Effort for Improving Quality and Time Performance." Montpellier 2, 2009. http://www.theses.fr/2009MON20213.

Full text

Abstract:

L'interopérabilité entre applications et les passerelles entre différentes sources de données sont devenues des enjeux cruciaux pour permettre des échanges d'informations op- timaux. Cependant, certains processus nécessaires à cette intégration ne peuvent pas être complétement automatisés à cause de leur complexité. L'un de ces processus, la mise en correspondance de schémas, est maintenant étudié depuis de nombreuses années. Il s'attaque au problème de la découverte de correspondances sémantiques entre éléments de différentes sources de données, mais il reste encore principalement effectué de manière manuelle. Par conséquent, le déploiement de larges systèmes de partage d'informations ne sera possible qu'en (semi-)automatisant ce processus de mise en correspondance. De nombreux outils de mise en correspondance de schémas ont été développés ces dernières décennies afin de découvrir automatiquement des mappings entre éléments de schémas. Cependant, ces outils accomplissent généralement des tâches de mise en correspondance pour des critères spécifiques, comme un scénario à large échelle ou la décou- verte de mappings complexes. Contrairement à la recherche sur l'alignement d'ontologies, il n'existe aucune plate-forme commune pour évaluer ces outils. Aussi la profusion d'outils de découverte de correspondances entre schémas, combinée aux deux problèmes évoqués précedemment, ne facilite pas, pour une utilisatrice, le choix d'un outil le plus ap- proprié pour découvrir des correspondances entre schémas. La première contribution de cette thèse consiste à proposer un outil d'évaluation, appelé XBenchMatch, pour mesurer les performances (en terme de qualité et de temps) des outils de découverte de correspondances entre schémas. Un corpus comprenant une dizaine de scénarios de mise en correspondance sont fournis avec XBenchMatch, chacun d'entre eux représentant un ou plusieurs critères relatif au processus de mise en correspondance de schémas. Nous avons également conçu et implémenté de nouvelles mesures pour évaluer la qualité des schémas intégrés et le post-effort de l'utilisateur. Cette étude des outils existants a permis une meilleure compréhension du processus de mise en correspondance de schémas. Le premier constat est que sans ressources externes telles que des dictionnaires ou des ontologies, ces outils ne sont généralement pas capables de découvrir des correspondances entre éléments possédant des étiquettes très différentes. Inversement, l'utilisation de ressources ne permet que rarement la découverte de correspondances entre éléments dont les étiquettes se ressemblent. Notre seconde contribution, BMatch, est un outil de découverte de correspondances entre schémas qui inclut une mesure de similarité structurelle afin de contrer ces problèmes. Nous démontrons ensuite de manière empirique les avantages et limites de notre approche. En effet, comme la plupart des outils de découverte de correspondances entre schémas, BMatch utilise une moyenne pondérée pour combiner plusieurs valeurs de similarité, ce qui implique une baisse de qualité et d'efficacité. De plus, la configuration des divers paramètres est une autre difficulté pour l'utilisatrice. Pour remédier à ces problèmes, notre outil MatchPlanner introduit une nouvelle méth- ode pour combiner des mesures de similarité au moyen d'arbres de décisions. Comme ces arbres peuvent être appris par apprentissage, les paramètres sont automatiquement config- urés et les mesures de similarité ne sont pas systématiquement appliquées. Nous montrons ainsi que notre approche améliore la qualité de découverte de correspondances entre sché- mas et les performances en terme de temps d'exécution par rapport aux outils existants. Enfin, nous laissons la possibilité à l'utilisatrice de spécifier sa préférence entre précision et rappel. Bien qu'équipés de configuration automatique de leurs paramètres, les outils de mise en correspondances de schémas ne sont pas encore suffisamment génériques pour obtenir des résultats qualitatifs acceptables pour une majorité de scénarios. C'est pourquoi nous avons étendu MatchPlanner en proposant une “fabrique d'outils” de découverte de correspondances entre schémas, nommée YAM (pour Yet Another Matcher). Cet outil apporte plus de flexibilité car il génère des outils de mise en correspondances à la carte pour un scénario donné. En effet, ces outils peuvent être considérés comme des classifieurs en apprentissage automatique, puisqu'ils classent des paires d'éléments de schémas comme étant pertinentes ou non en tant que mappings. Ainsi, le meilleur outil de mise en cor- respondance est construit et sélectionné parmi un large ensemble de classifieurs. Nous mesurons aussi l'impact sur la qualité lorsque l'utilisatrice fournit à l'outil des mappings experts ou lorsqu'elle indique une préférence entre précision et rappel
Interoperability between applications or bridges between data sources are required to allow optimal information exchanges. Yet, some processes needed to bring this integra- tion cannot be fully automatized due to their complexity. One of these processes is called matching and it has now been studied for years. It aims at discovering semantic corre- spondences between data sources elements and is still largely performed manually. Thus, deploying large data sharing systems requires the (semi-)automatization of this matching process. Many schema matching tools were designed to discover mappings between schemas. However, some of these tools intend to fulfill matching tasks with specific criteria, like a large scale scenario or the discovery of complex mappings. And contrary to ontology alignment research field, there is no common platform to evaluate them. The abundance of schema matching tools, added to the two previously mentioned issues, does not facil- itate the choice, by an user, of the most appropriate tool to match a given scenario. In this dissertation, our first contribution deals with a benchmark, XBenchMatch, to evaluate schema matching tools. It consists of several schema matching scenarios, which features one or more criteria. Besides, we have designed new measures to evaluate the quality of integrated schemas and the user post-match effort. This study and analysis of existing matching tools enables a better understanding of the matching process. Without external resources, most matching tools are mainly not able to detect a mapping between elements with totally dissimilar labels. On the contrary, they cannot infirm a mapping between elements with similar labels. Our second contribu- tion, BMatch, is a matching tool which includes a structural similarity measure and it aims at solving these issues by only using the schema structure. Terminological measures en- able the discovery of mappings whose schema elements share similar labels. Conversely, structural measures, based on cosine measure, detects mappings when schema elements have the same neighbourhood. BMatch's second aspect aims at improving the time per- formance by using an indexing structure, the B-tree, to accelerate the schema matching process. We empirically demonstrate the benefits and the limits of our approach. Like most schema matching tools, BMatch uses an aggregation function to combine similarity values, thus implying several drawbacks in terms of quality and performance. Tuning the parameters is another burden for the user. To tackle these issues, MatchPlanner introduces a new method to combine similarity measures by relying on decision trees. As decision trees can be learned, parameters are automatically tuned and similarity measures are only computed when necessary. We show that our approach provides an increase in terms of matching quality and better time performance with regards to other matching tools. We also present the possibility to let users choose a preference between precision and recall. Even with tuning capabilities, schema matching tools are still not generic enough to provide acceptable quality results for most schema matching scenarios. We finally extend MatchPlanner by proposing a factory of schema matchers, named YAM (for Yet Another Matcher). This tool brings more flexibility since it generates an 'a la carte' matcher for a given schema matching scenario. Indeed, schema matchers can be seen as machine learn- ing classifiers since they classify pairs of schema elements either as relevant or irrelevant. Thus, the best matcher in terms of matching quality is built and selected from a set of different classifiers. We also show impact on the quality when user provides some inputs, namely a list of expert mappings and a preference between precision and recall

APA, Harvard, Vancouver, ISO, and other styles

16

Powell, Nina Laurel. "Reasoning and processing of behavioural and contextual information : influences on pre-judgement reasoning, post-judgement information selection and engagement, and moral behaviour." Thesis, University of Birmingham, 2013. http://etheses.bham.ac.uk//id/eprint/4252/.

Full text

Abstract:

Recent research on moral judgements tends to emphasise the role of intuition, emotion and non-deliberative gut-reactions to moral violations. The aim of this thesis was to investigate instances during the judgement process and on resulting behaviour when deliberative consideration and processing of behavioural and contextual information (i.e., information beyond initial gut-reactions and intuitions) occurred. Specifically, this thesis examined the effects of reasoning about behavioural and contextual information pre-judgement, the desires and needs for and engagement with behavioural information and the effects of behavioural and contextual information on eliciting moral behaviour. Across seven experiments, I demonstrated (1) that age-related changes in the ability to reason about the means through which a negative outcome occurred influenced attributions of blameworthiness, (2) that postjudgement information selection and engagement differed depending on the moral violation judged, emotions elicited from the violations and the amount of reported epistemic certainty, and (3) that the presence of information about the outcome of a morally virtuous act influenced later helping behaviour. These findings suggest that deliberative reasoning and processing of behavioural and contextual information can occur and influence judgements and behaviour at different stages in the judgement process.

APA, Harvard, Vancouver, ISO, and other styles

17

Poleto, Frederico Zanqueta. "Análise de dados categorizados com omissão." Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-04122007-192457/.

Full text

Abstract:

Neste trabalho aborda-se aspectos teóricos, computacionais e aplicados de análises clássicas de dados categorizados com omissão. Uma revisão da literatura é apresentada enquanto se introduz os mecanismos de omissão, mostrando suas características e implicações nas inferências de interesse por meio de um exemplo considerando duas variáveis respostas dicotômicas e estudos de simulação. Amplia-se a modelagem descrita em Paulino (1991, Brazilian Journal of Probability and Statistics 5, 1-42) da distribuição multinomial para a produto de multinomiais para possibilitar a inclusão de variáveis explicativas na análise. Os resultados são desenvolvidos em formulação matricial adequada para a implementação computacional, que é realizada com a construção de uma biblioteca para o ambiente estatístico R, a qual é disponibilizada para facilitar o traçado das inferências descritas nesta dissertação. A aplicação da teoria é ilustrada por meio de cinco exemplos de características diversas, uma vez que se ajusta modelos estruturais lineares (homogeneidade marginal), log-lineares (independência, razão de chances adjacentes comum) e funcionais lineares (kappa, kappa ponderado, sensibilidade/especificidade, valor preditivo positivo/negativo) para as probabilidades de categorização. Os padrões de omissão também são variados, com omissões em uma ou duas variáveis, confundimento de células vizinhas, sem ou com subpopulações.
We consider theoretical, computational and applied aspects of classical categorical data analyses with missingness. We present a literature review while introducing the missingness mechanisms, highlighting their characteristics and implications in the inferences of interest by means of an example involving two binary responses and simulation studies. We extend the multinomial modeling scenario described in Paulino (1991, Brazilian Journal of Probability and Statistics 5, 1-42) to the product-multinomial setup to allow for the inclusion of explanatory variables. We develop the results in matrix formulation and implement the computational procedures via subroutines written under R statistical environment. We illustrate the application of the theory by means of five examples with different characteristics, fitting structural linear (marginal homogeneity), log-linear (independence, constant adjacent odds ratio) and functional linear models (kappa, weighted kappa, sensitivity/specificity, positive/negative predictive value) for the marginal probabilities. The missingness patterns includes missingness in one or two variables, neighbor cells confounded, with or without explanatory variables.

APA, Harvard, Vancouver, ISO, and other styles

18

Haouas, Nabiha. "Wind energy analysis and change point analysis." Thesis, Clermont-Ferrand 2, 2015. http://www.theses.fr/2015CLF22554.

Full text

Abstract:

L’énergie éolienne, l’une des énergies renouvelables les plus compétitives, est considérée comme une solution qui remédie aux inconvénients de l’énergie fossile. Pour une meilleure gestion et exploitation de cette énergie, des prévisions de sa production s’avèrent nécessaires. Les méthodes de prévisions utilisées dans la littérature permettent uniquement une prévision de la moyenne annuelle de cette production. Certains travaux récents proposent l’utilisation du Théorème Central Limite (TCL), sous des hypothèses non classiques, pour l’estimation de la production annuelle moyenne de l’énergie éolienne ainsi que sa variance pour une seule turbine. Nous proposons dans cette thèse une extension de ces travaux à un parc éolien par relaxation de l’hypothèse de stationnarité la vitesse du vent et la production d’énergie, en supposant que ces dernières sont saisonnières. Sous cette hypothèse la qualité de la prévision annuelle s’améliore considérablement. Nous proposons aussi de prévoir la production d’énergie éolienne au cours des quatre saisons de l’année. L’utilisation du modèle fractal, nous permet de trouver une division ”naturelle” de la série de la vitesse du vent afin d’affiner l’estimation de la production éolienne en détectant les points de ruptures. Dans les deux derniers chapitres, nous donnons des outils statistiques de la détection des points de ruptures et d’estimation des modèles fractals
The wind energy, one of the most competitive renewable energies, is considered as a solution which remedies the inconveniences of the fossil energy. For a better management and an exploitation of this energy, forecasts of its production turn out to be necessary. The methods of forecasts used in the literature allow only a forecast of the annual mean of this production. Certain recent works propose the use of the Central Limit Theorem (CLT), under not classic hypotheses, for the estimation of the mean annual production of the wind energy as well as its variance for a single turbine. We propose in this thesis, an extension of these works in a wind farm by relaxation of the hypothesis of stationarity the wind speed and the power production, supposing that the latter are seasonal. Under this hypothesis the quality of the annual forecast improves considerably. We also suggest planning the wind power production during four seasons of the year. The use of the fractal model, allows us to find a "natural" division of the series of the wind speed to refine the estimation of the wind production by detecting abrupt change points. Statistical tools of the change points detection and the estimation of fractal models are presented in the last two chapters

APA, Harvard, Vancouver, ISO, and other styles

19

LI, SHAO-PENG, and 李少芃. "Categorical Variable Selection and Level Clustering in Count Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/umr3yz.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Liu, Chen-Ying, and 劉振熒. "A Model Selection Technique between Two Empirical Bayes Models for Categorical Data." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/98221355733341962404.

Full text

Abstract:

碩士
國立交通大學
統計學研究所
93
In the paper, first of all, a model selection technique between two empirical Bayes models for categorical data in manufacturing is proposed. Next, two useful empirical Bayes models for categorical data in manufacturing are introduced. Finally, the performance of the proposed method is illustrated by an example through simulations.

APA, Harvard, Vancouver, ISO, and other styles

21

"Pre- and post-copulatory sexual selection in the tortoise beetle Acromis sparsa (Coleoptera: Chrysomelidae)." UNIVERSITY OF MONTANA, 2009. http://pqdtopen.proquest.com/#viewpdf?dispub=3338789.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Li, Yu-Ching, and 李俞青. "Optimal Selection of Indicators and Portfolio by Genetic Algorithms pre- and post- the Financial Crisis." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/47e33d.

Full text

Abstract:

碩士
亞洲大學
財務金融學系碩士在職專班
103
Based on a literature review and the multiple indicators of fundamentals and chips, this paper employs the genetic algorithm (GA), screens the best analysis indicators and threshold values and determines the most suitable investment portfolio. The data from 2007 to 2009, when the Financial Crisis occurred, are sourced, with the annual returns of individual shares as the target. Using the pointer of the Sortino ratio as the basis, the average return of selected blue chips is calculated. Empirical results show that the return of pre-crisis GA is the largest, followed by that of the index, and whiles the lowest one is the traditional methods. The return of middle-crisis GA is the largest, followed by that of the traditional methods, and the lowest one is the index. The return of post-crisis GA is the largest, followed by that of traditional methods, and the lowest one is the index. As for the best analysis indicators selected with GA, the best indicator for the pre-crisis GA is free cash flow, free cash flow and earnings per share for middle-crisis GA, and free cash flow and securities-cash ratio for post-crisis GA. Regardless of pre-crisis, middle-crisis or post-crisis GA, GA has better returns than the traditional methods and the index. Moreover, traditional methods have better returns than the index in both middle-crisis or post-crisis periods, suggesting that the financial crisis would affect the return of investment portfolio. The free cash flow is the best analysis indicator during pre-crisis, middle-crisis or post-crisis periods, which worth deserves attention from investors.

APA, Harvard, Vancouver, ISO, and other styles

23

MacEachern, Kathryn Anne. "A Comparison of Categorical vs. Fractional Parental Allocation Based on Microsatellite Markers to Estimate Reproductive Success and Inbreeding Levels Over Three Generations of Selective Breeding in a Closed Population of Rainbow Trout (Oncorhynchus mykiss)." 2011. http://hdl.handle.net/10222/14356.

Full text

Abstract:

The aim of this project was to assess three DNA-marker based pedigree reconstruction approaches and their associated challenges, strengths and weaknesses by conducting a retrospective analysis of a real, three generation rainbow trout (Oncorhynchus mykiss) pedigree from the SPA hatchery. Molecular genetic data at as few as three or four loci was used to infer relatedness among individuals and between generations in the reconstruction of the three full pedigrees. Parentage and pedigree reconstruction was estimated, for the quasi-categorical (exclusion-based and LOD-based) approaches via the program CERVUS 3.0 and for the fractional approach via a software (PIPEDIGREE), developed for this project. The fractional pedigree method appeared superior, particularly for the estimation of inbreeding levels. This retrospective analysis was able to demonstrate, under different pedigree reconstruction approaches, that the semi-selective, on-farm breeding scheme implemented at the time was successful in limiting the level of inbreeding increase and identifying possibly superior broodstock.
MSc Thesis

APA, Harvard, Vancouver, ISO, and other styles

24

Silvestre, Cláudia Marisa Vasconcelos. "Clustering with discrete mixture models: An integrated approach for model selection." Doctoral thesis, 2014. http://hdl.handle.net/10071/9991.

Full text

Abstract:

A investigação em analise de agrupamento (cluster analysis) continua em curso. Identificar o número de grupos, bem como seleccionar um subconjunto de variáveis relevantes a partir de dados de uma amostra constituem domínios de investigação ativa em agrupamento. Grande parte dos métodos desenvolvidos para abordar estas temáticas refere-se a dados contínuos, e não podem ser directamente aplicados ao agrupamento de dados categoriais. Este trabalho, pretende ser um contributo nesta área, abordando o agrupamento de dados categoriais.
Research on cluster analysis continues to develop. Identifying the number of clusters and selecting a subset of relevant variables available in the data have been active areas in research on clustering methods. The approaches proposed for addressing these issues are mostly designed to deal with numerical data and cannot be directly applied for clustering categorical data. This work intends to be a contribution to handling categorical data, in this area.

APA, Harvard, Vancouver, ISO, and other styles

25

Ranjineh, Khojasteh Enayatollah. "Geostatistical three-dimensional modeling of the subsurface unconsolidated materials in the Göttingen area." Doctoral thesis, 2013. http://hdl.handle.net/11858/00-1735-0000-0001-BB9A-B.

Full text

Abstract:

Das Ziel der vorliegenden Arbeit war die Erstellung eines dreidimensionalen Untergrundmodells der Region Göttingen basierend auf einer geotechnischen Klassifikation der unkosolidierten Sedimente. Die untersuchten Materialen reichen von Lockersedimenten bis hin zu Festgesteinen, werden jedoch in der vorliegenden Arbeit als Boden, Bodenklassen bzw. Bodenkategorien bezeichnet. Diese Studie evaluiert verschiedene Möglichkeiten durch geostatistische Methoden und Simulationen heterogene Untergründe zu erfassen. Derartige Modellierungen stellen ein fundamentales Hilfswerkzeug u.a. in der Geotechnik, im Bergbau, der Ölprospektion sowie in der Hydrogeologie dar. Eine detaillierte Modellierung der benötigten kontinuierlichen Parameter wie z. B. der Porosität, der Permeabilität oder hydraulischen Leitfähigkeit des Untergrundes setzt eine exakte Bestimmung der Grenzen von Fazies- und Bodenkategorien voraus. Der Fokus dieser Arbeit liegt auf der dreidimensionalen Modellierung von Lockergesteinen und deren Klassifikation basierend auf entsprechend geostatistisch ermittelten Kennwerten. Als Methoden wurden konventionelle, pixelbasierende sowie übergangswahrscheinlichkeitsbasierende Markov-Ketten Modelle verwendet. Nach einer generellen statistischen Auswertung der Parameter wird das Vorhandensein bzw. Fehlen einer Bodenkategorie entlang der Bohrlöcher durch Indikatorparameter beschrieben. Der Indikator einer Kategorie eines Probepunkts ist eins wenn die Kategorie vorhanden ist bzw. null wenn sie nicht vorhanden ist. Zwischenstadien können ebenfalls definiert werden. Beispielsweise wird ein Wert von 0.5 definiert falls zwei Kategorien vorhanden sind, der genauen Anteil jedoch nicht näher bekannt ist. Um die stationären Eigenschaften der Indikatorvariablen zu verbessern, werden die initialen Koordinaten in ein neues System, proportional zur Ober- bzw. Unterseite der entsprechenden Modellschicht, transformiert. Im neuen Koordinatenraum werden die entsprechenden Indikatorvariogramme für jede Kategorie für verschiedene Raumrichtungen berechnet. Semi-Variogramme werden in dieser Arbeit, zur besseren Übersicht, ebenfalls als Variogramme bezeichnet. IV Durch ein Indikatorkriging wird die Wahrscheinlichkeit jeder Kategorie an einem Modellknoten berechnet. Basierend auf den berechneten Wahrscheinlichkeiten für die Existenz einer Modellkategorie im vorherigen Schritt wird die wahrscheinlichste Kategorie dem Knoten zugeordnet. Die verwendeten Indikator-Variogramm Modelle und Indikatorkriging Parameter wurden validiert und optimiert. Die Reduktion der Modellknoten und die Auswirkung auf die Präzision des Modells wurden ebenfalls untersucht. Um kleinskalige Variationen der Kategorien auflösen zu können, wurden die entwickelten Methoden angewendet und verglichen. Als Simulationsmethoden wurden "Sequential Indicator Simulation" (SISIM) und der "Transition Probability Markov Chain" (TP/MC) verwendet. Die durchgeführten Studien zeigen, dass die TP/MC Methode generell gute Ergebnisse liefert, insbesondere im Vergleich zur SISIM Methode. Vergleichend werden alternative Methoden für ähnlichen Fragestellungen evaluiert und deren Ineffizienz aufgezeigt. Eine Verbesserung der TP/MC Methoden wird ebenfalls beschrieben und mit Ergebnissen belegt, sowie weitere Vorschläge zur Modifikation der Methoden gegeben. Basierend auf den Ergebnissen wird zur Anwendung der Methode für ähnliche Fragestellungen geraten. Hierfür werden Simulationsauswahl, Tests und Bewertungsysteme vorgeschlagen sowie weitere Studienschwerpunkte beleuchtet. Eine computergestützte Nutzung des Verfahrens, die alle Simulationsschritte umfasst, könnte zukünftig entwickelt werden um die Effizienz zu erhöhen. Die Ergebnisse dieser Studie und nachfolgende Untersuchungen könnten für eine Vielzahl von Fragestellungen im Bergbau, der Erdölindustrie, Geotechnik und Hydrogeologie von Bedeutung sein.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Pre-categorical and post-categorical selection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles