Увійти

Готові списки джерел за темами / Mixed categorical variables

Добірка наукової літератури з теми "Mixed categorical variables"

Автор: Grafiati

Опубліковано: 7 липня 2024

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Mixed categorical variables".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Зміст

Статті в журналах
Дисертації
Частини книг
Тези доповідей конференцій

Статті в журналах з теми "Mixed categorical variables":

1

McCane, Brendan, and Michael Albert. "Distance functions for categorical and mixed variables." Pattern Recognition Letters 29, no. 7 (May 2008): 986–93. http://dx.doi.org/10.1016/j.patrec.2008.01.021.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Horníček, Jaroslav, and Hana Řezanková. "Missing Data Imputation for Categorical Variables." Statistika: Statistics and Economy Journal 102, no. 3 (September 2022): 249–60. http://dx.doi.org/10.54694/stat.2022.3.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Dealing with missing data is a crucial part of everyday data analysis. The IMIC algorithm is a missing data imputation method that can handle mixed numerical and categorical datasets. However, the categorical data are crucial for this work. This paper proposes the new improvement of the IMIC algorithm. The two proposed modifications consider the number of categories in each categorical variable. Based on this information, the factor, which modifies the original measure, is computed. The factor equation is inspired by the Eskin similarity measure that is known in the hierarchical clustering of categorical data. The results show that as the missing value ratio in the dataset grows, better results are achieved using the second modification. The paper also shortly analyzes the advantages and disadvantages of using the IMIC algorithm.

3

Zuo, Yan, Vu Nguyen, Amir Dezfouli, David Alexander, Benjamin Ward Muir, and Iadine Chades. "Mixed-Variable Black-Box Optimisation Using Value Proposal Trees." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 11506–14. http://dx.doi.org/10.1609/aaai.v37i9.26360.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Many real-world optimisation problems are defined over both categorical and continuous variables, yet efficient optimisation methods such as Bayesian Optimisation (BO) are ill-equipped to handle such mixed-variable search spaces. The optimisation breadth introduced by categorical variables in the mixed-input setting has seen recent approaches operating on local trust regions, but these methods can be greedy in suboptimal regions of the search space. In this paper, we adopt a holistic view and aim to consolidate optimisation of the categorical and continuous sub-spaces under a single acquisition metric. We develop a tree-based method which retains a global view of the optimisation spaces by identifying regions in the search space with high potential candidates which we call value proposals. Our method uses these proposals to make selections on both the categorical and continuous components of the input. We show that this approach significantly outperforms existing mixed-variable optimisation approaches across several mixed-variable black-box optimisation tasks.

4

Lee, Sik-Yum, Xin-Yuan Song, and Bin Lu. "Discriminant Analysis Using Mixed Continuous, Dichotomous, and Ordered Categorical Variables." Multivariate Behavioral Research 42, no. 4 (December 28, 2007): 631–45. http://dx.doi.org/10.1080/00273170701710114.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Di Nuzzo, Cinzia. "Advancing Spectral Clustering for Categorical and Mixed-Type Data: Insights and Applications." Mathematics 12, no. 4 (February 6, 2024): 508. http://dx.doi.org/10.3390/math12040508.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This study focuses on adapting spectral clustering, a numeric data-clustering technique, for categorical and mixed-type data. The method enhances spectral clustering for categorical and mixed-type data with novel kernel functions, showing improved accuracy in real-world applications. Despite achieving better clustering for datasets with mixed variables, challenges remain in identifying suitable kernel functions for categorical relationships.

6

Morales, D., L. Pardo, and K. Zografos. "Informational distances and related statistics in mixed continuous and categorical variables." Journal of Statistical Planning and Inference 75, no. 1 (November 1998): 47–63. http://dx.doi.org/10.1016/s0378-3758(98)00120-7.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Ng, Michael K., Elaine Y. Chan, Meko M. C. So, and Wai-Ki Ching. "A semi-supervised regression model for mixed numerical and categorical variables." Pattern Recognition 40, no. 6 (June 2007): 1745–52. http://dx.doi.org/10.1016/j.patcog.2006.06.018.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Leung, Chi-Ying. "Regularized classification for mixed continuous and categorical variables under across-location heteroscedasticity." Journal of Multivariate Analysis 93, no. 2 (April 2005): 358–74. http://dx.doi.org/10.1016/j.jmva.2004.03.001.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Munoz Zuniga, Miguel, and Delphine Sinoquet. "Global optimization for mixed categorical-continuous variables based on Gaussian process models with a randomized categorical space exploration step." INFOR: Information Systems and Operational Research 58, no. 2 (March 19, 2020): 310–41. http://dx.doi.org/10.1080/03155986.2020.1730677.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Han, Jisoo, and HyungJun Cho. "A Study on Cluster Analysis of Mixed Data with Continuous and Categorical Variables." Korean Data Analysis Society 20, no. 4 (August 31, 2018): 1769–80. http://dx.doi.org/10.37727/jkdas.2018.20.4.1769.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Дисертації з теми "Mixed categorical variables":

1

Adamec, Vaclav. "The Effect of Maternal and Fetal Inbreeding on Dystocia, Calf Survival, Days to First Service and Non-Return Performance in U.S. Dairy Cattle." Diss., Virginia Tech, 2002. http://hdl.handle.net/10919/25999.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Intensive selection for increased milk production over many generations has led to growing genetic similarity and increased relationships in dairy population. In the current study, inbreeding depression was estimated for number of days to first service, summit milk, conception by 70 days non-return, and calving rate with a linear mixed model (LMM) approach and for calving difficulty, calf mortality with a Bayesian threshold model (BTM) for categorical traits. Effectiveness of classical and unknown parentage group procedures to estimate inbreeding coefficients was evaluated depending on completeness of a 5-generation pedigree. A novel method derived from the classical formula to estimate inbreeding was utilized to evaluate completeness of pedigrees. Two different estimates of maternal inbreeding were fitted in separate models as a linear covariate in combined LMM analyses (Holstein registered and grade cows and Jersey cows) or separate analyses (registered Holstein cows) by parity (1-4) with fetal inbreeding. Impact of inbreeding type, model, data structure, and treatment of herd-year-season (HYS) on magnitude and size of inbreeding depression were assessed. Grade Holstein datasets were sampled and analyzed by percentage of pedigree present (0-30%, 30-70% and 70-100%). BTM analyses (sire-mgs) were performed using Gibbs sampling for parities 1, 2 and 3 fitting maternal inbreeding only. In LMM analyses of grade data, the least pedigree and diagonal A matrix performed the worst. Significant inbreeding effects were obtained in most traits in cows of parity 1. Fetal inbreeding depression was mostly lower than that from maternal inbreeding. Inbreeding depression in binary traits was the most difficult to evaluate. Analyses with non-additive effects included in LMM, for data by inbreeding level and by age group should be preferred to estimate inbreeding depression. In BTM inbreeding effects were strongly related to dam parity and calf sex. Largest effects were obtained from parity 1 cows giving birth to male calves (0.417% and 0.252% for dystocia and calf mortality) and then births to female calves (0.300% and 0.203% for dystocia and calf mortality). Female calves from mature cows were the least affected (0.131% and 0.005% for dystocia and calf mortality). Data structure was found to be a very important factor to attainment of convergence in distribution.
Ph. D.

2

Barjhoux, Pierre-Jean. "Towards efficient solutions for large scale structural optimization problems with categorical and continuous mixed design variables." Thesis, Toulouse, ISAE, 2020. http://depozit.isae.fr/theses/2020/2020_Barjhoux_Pierre-Jean.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Dans l’industrie aéronautique, les problèmes d’optimisation de structurepeuvent impliquer des changements de matériaux, de types de raidisseurs, et detailles d’éléments. Dans ce travail, il est ainsi proposé de résoudre des problèmes degrande taille (minimisation de masse) par rapport à des variables catégorielles et continues,sujets à des contraintes de stress et de déplacements. Trois algorithmes sontprésentés, discutés dans le manuscrit au regard de cas tests de plus en plus complexes.En tout premier lieu, un algorithme basé sur le "branch and bound" a été mis en place.Une formulation d’un problème dédié au calcul de minorants de la masse optimale estproposée. Bien que l’algorithme permette de trouver des solutions optimales, la tendancedu coût de calcul en fonction de l’augmentation du nombre d’éléments est exponentielle.Le second algorithme s’appuie sur une formulation bi-niveau du problème d’origine, oùle problème supérieur consiste à minimiser une approximation au premier ordre du résultatdu niveau inférieur. L’évolution du coût de calcul par rapport à l’augmentation dunombre d’éléments et de valeurs catégorielles est quasiment linéaire. Enfin, un troisièmealgorithme tire partie d’une reformulation du problème mixte catégoriel continu en unproblème bi-niveau mixte avec variables entières continûment relâchables. Les cas testsnumériques montrent la résolution d’un problème avec plus d’une centaine d’éléments.Également, le coût de calcul est quasi-indépendant du nombre de valeurs de variablescatégorielles disponibles par élément
Nowadays in the aircraft industry, structural optimization problemscan be really complex and combine changes in choices of materials, stiffeners, orsizes/types of elements. In this work, it is proposed to solve large scale structural weightminimization problems with both categorical and continuous variables, subject to stressand displacements constraints. Three algorithms have been proposed. As a first attempt,an algorithm based on the branch and bound generic framework has been implemented.A specific formulation to compute lower bounds has been proposed. According to thenumerical tests, the algorithm returned the exact optima. However, the exponentialscalability of the computational cost with respect to the number of structural elementsprevents from an industrial application. The second algorithm relies on a bi-level formulationof the mixed categorical problem. The master full categorical problem consists ofminimizing a first order like approximation of the slave problem with respect to the categoricaldesign variables. The method offers a quasi-linear scaling of the computationalcost with respect to the number of elements and categorical values. Finally, in the thirdapproach the optimization problem is formulated as a bi-level mixed integer non-linearprogram with relaxable design variables. Numerical tests include an optimization casewith more than one hundred structural elements. Also, the computational cost scalingis quasi-independent from the number of available categorical values per element

3

Saves, Paul. "High dimensional multidisciplinary design optimization for eco-design aircraft." Electronic Thesis or Diss., Toulouse, ISAE, 2024. http://www.theses.fr/2024ESAE0002.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

De nos jours, un intérêt significatif et croissant pour améliorer les processus de conception de véhicules s'observe dans le domaine de l'optimisation multidisciplinaire grâce au développement de nouveaux outils et de nouvelles techniques. Concrètement, en conception aérostructure, les variables aérodynamiques et structurelles s'influencent mutuellement et ont un effet conjoint sur des quantités d'intérêt telles que le poids ou la consommation de carburant. L'optimisation multidisciplinaire se présente alors comme un outil puissant pouvant effectuer des compromis interdisciplinaires.Dans le cadre de la conception aéronautique, le processus multidisciplinaire implique généralement des variables de conception mixtes, continues et catégorielles. Par exemple, la taille des pièces structurelles d'un avion peut être décrite à l'aide de variables continues, le nombre de panneaux est associé à un entier et la liste des sections transverses ou le choix des matériaux correspondent à des choix catégoriels. L'objectif de cette thèse est de proposer une approche efficace pour optimiser un modèle multidisciplinaire boîte noire lorsque le problème d'optimisation est contraint et implique un grand nombre de variables de conception mixtes (typiquement 100 variables). L'approche d'optimisation bayésienne utilisée consiste en un enrichissement séquentiel adaptatif d'un métamodèle pour approcher l'optimum de la fonction objectif tout en respectant les contraintes.Les modèles de substitution par processus gaussiens sont parmi les plus utilisés dans les problèmes d'ingénierie pour remplacer des modèles haute fidélité coûteux en temps de calcul. L'optimisation globale efficace est une méthode heuristique d'optimisation bayésienne conçue pour la résolution globale de problèmes d'optimisation coûteux à évaluer permettant d'obtenir des résultats de bonne qualité rapidement. Cependant, comme toute autre méthode d'optimisation globale, elle souffre du fléau de la dimension, ce qui signifie que ses performances sont satisfaisantes pour les problèmes de faible dimension, mais se détériorent rapidement à mesure que la dimension de l'espace de recherche augmente. Ceci est d'autant plus vrai que les problèmes de conception de systèmes complexes intègrent à la fois des variables continues et catégorielles, augmentant encore la taille de l'espace de recherche. Dans cette thèse, nous proposons des méthodes pour réduire de manière significative le nombre de variables de conception comme, par exemple, des techniques d'apprentissage actif telles que la régression par moindres carrés partiels. Ainsi, ce travail adapte l'optimisation bayésienne aux variables discrètes et à la grande dimension pour réduire le nombre d'évaluations lors de l'optimisation de concepts d'avions innovants moins polluants comme la configuration hybride électrique "DRAGON"
Nowadays, there has been significant and growing interest in improving the efficiency of vehicle design processes through the development of tools and techniques in the field of multidisciplinary design optimization (MDO). In fact, when optimizing both the aerodynamics and structures, one needs to consider the effect of the aerodynamic shape variables and structural sizing variables on the weight which also affects the fuel consumption. MDO arises as a powerful tool that can perform this trade-off automatically. The objective of the Ph. D project is to propose an efficient approach for solving an aero-structural wing optimization process at the conceptual design level. The latter is formulated as a constrained optimization problem that involves a large number of design variables (typically 700 variables). The targeted optimization approach is based on a sequential enrichment (typically efficient global optimization (EGO)), using an adaptive surrogate model. Kriging surrogate models are one of the most widely used in engineering problems to substitute time-consuming high fidelity models. EGO is a heuristic method, designed for the solution of global optimization problems that has performed well in terms of quality of the solution computed. However, like any other method for global optimization, EGO suffers from the curse of dimensionality, meaning that its performance is satisfactory on lower dimensional problems, but deteriorates as the dimensionality of the optimization search space increases. For realistic aircraft wing design problems, the typical size of the design variables exceeds 700 and, thus, trying to solve directly the problems using EGO is ruled out. In practical test cases, high dimensional MDO problems may possess a lower intrinsic dimensionality, which can be exploited for optimization. In this context, a feature mapping can then be used to map the original high dimensional design variable onto a sufficiently small design space. Most of the existing approaches in the literature use random linear mapping to reduce the dimension, sometimes active learning is used to build this linear embedding. Generalizations to non-linear subspaces are also proposed using the so-called variational autoencoder. For instance, a composition of Gaussian processes (GP), referred as deep GP, can be very useful. In this PhD thesis, we will investigate efficient parameterization tools to significantly reduce the number of design variables by using active learning technics. An extension of the method could be also proposed to handle mixed continuous and categorical inputs using some previous works on low dimensional problems. Practical implementations within the OpenMDAO framework (an open source MDO framework developed by NASA) are expected

4

Hjerpe, Adam. "Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185496.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The Random Forest model is commonly used as a predictor function and the model have been proven useful in a variety of applications. Their popularity stems from the combination of providing high prediction accuracy, their ability to model high dimensional complex data, and their applicability under predictor correlations. This report investigates the random forest variable importance measure (VIM) as a means to find a ranking of important variables. The robustness of the VIM under imputation of categorical noise, and the capability to differentiate informative predictors from non-informative variables is investigated. The selection of variables may improve robustness of the predictor, improve the prediction accuracy, reduce computational time, and may serve as a exploratory data analysis tool. In addition the partial dependency plot obtained from the random forest model is examined as a means to find underlying relations in a non-linear simulation study.
Random Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.

5

"Empirical investigation of the performance of Mplus for analyzing structural equation model with mixed continuous and ordered categorical variables." 2003. http://library.cuhk.edu.hk/record=b5891552.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Lam Ho-Suen Joffee.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaf 40).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 2 --- Review of Mplus --- p.3
Chapter 3 --- Design of the Simulation Study --- p.6
Chapter 3.1 --- Simulation Design --- p.6
Chapter 3.2 --- Covariance Structure Analysis and Mplus Restriction --- p.10
Chapter 3.3 --- Implementation --- p.10
Chapter 4 --- Method of Evalution --- p.12
Chapter 4.1 --- Accuracy of Parameter Estimates --- p.12
Chapter 4.2 --- Distribution of the Goodness-of-fit Statistic --- p.13
Chapter 4.3 --- Precision of Standard Errors --- p.14
Chapter 4.4 --- Number of Replications --- p.15
Chapter 5 --- Results of the Simulation Study --- p.17
Chapter 5.1 --- Accuracy of the Parameter Estimates --- p.17
Chapter 5.2 --- Distribution of the Goodness-of-fit Statistic --- p.18
Chapter 5.3 --- Precision of the Standard Error --- p.19
Chapter 5.4 --- Results when the Sample Size is Extremely Large --- p.20
Chapter 5.5 --- Conclusion --- p.21
Chapter 6 --- Additional Simulation Study --- p.27
Chapter 6.1 --- Precision of Standard Error when the Model Consists of Only Con- tinuous and Only Ordinal Variables --- p.28
Chapter 6.2 --- Comparison of the Simulation Results of Mplus and LISREL --- p.29
Chapter 6.3 --- Conclusion --- p.31
Chapter 7 --- Conclusion and Discussion --- p.33
Chapter A --- Mplus Sample Program (Condition C1 S2 N=500) --- p.36
Chapter B --- PRELIS Sample Program (Condition C1 S1 N=500) --- p.37

Частини книг з теми "Mixed categorical variables":

1

Salinas Ruíz, Josafhat, Osval Antonio Montesinos López, Gabriela Hernández Ramírez, and Jose Crossa Hiriart. "Generalized Linear Mixed Models for Categorical and Ordinal Responses." In Generalized Linear Mixed Models with Applications in Agriculture and Biology, 321–76. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-32800-8_8.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractAccording to Agresti (2013), a multinomial distribution is a generalization of a binomial distribution in cases with more than two possible ordered (ordinal) or unordered (nominal) outcomes. Given a response with more than two possible outcomes and independent trials with probabilities of similar category for each trial, the distribution of counts across categories follows a multinomial distribution. Quinn and Keough (2002) believe that several methods exist for multinomial data analysis. The most common form of categorical data analysis in biological sciences, which results in frequency counts, is creating cross-tabulations or contingency tables and chi-squared tests to examine associations between two or more categorical variables. However, such an approach is ill suited for a study aimed at estimating the response when there is a change in the explanatory variable(s), as contingency tables are used to analyze the association between variables without considering a predictor or response variable. In this analysis, the results are valid as long as less than 20% of the cells have an expected count less than five and none are less than one (Logan 2010). Fisher’s exact test extends the chi-squared test in studies involving small sample sizes.

2

Salinas Ruíz, Josafhat, Osval Antonio Montesinos López, Gabriela Hernández Ramírez, and Jose Crossa Hiriart. "Generalized Linear Models." In Generalized Linear Mixed Models with Applications in Agriculture and Biology, 43–84. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-32800-8_2.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractIn the generalized linear model (GLM) (which is not highly general) y = Xβ + ϵ, the response variables are normally distributed, with constant variance across the values of all the predictor variables, and are linear functions of the predictor variables. Transformations of data are used to try to force the data into a normal linear regression model or to find a non-normal-type response variable transformation (discrete, categorical, positive continuous scale, etc.) that is linearly related to the predictor variables; however, this is no longer necessary. Instead of using a normal distribution, a positively skewed distribution with values that are positive real numbers can be selected. Generalized linear models (GLMs) go beyond linear mixed models, taking into account that the response variables are not of continuous scale (not normally distributed), GLMs are heteroscedastic, and there is a linear relationship between the mean of the response variable and the predictor or explanatory variables.

3

Caruso, Giulia, Adelia Evangelista, and Stefano Antonio Gattone. "Profiling visitors of a national park in Italy through unsupervised classification of mixed data." In Proceedings e report, 135–40. Florence: Firenze University Press, 2021. http://dx.doi.org/10.36253/978-88-5518-304-8.27.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Cluster analysis has for long been an effective tool for analysing data. Thus, several disciplines, such as marketing, psychology and computer sciences, just to mention a few, did take advantage from its contribution over time. Traditionally, this kind of algorithm concentrates only on numerical or categorical data at a time. In this work, instead, we analyse a dataset composed of mixed data, namely both numerical than categorical ones. More precisely, we focus on profiling visitors of the National Park of Majella in the Abruzzo region of Italy, which observations are characterized by variables such as gender, age, profession, expectations and satisfaction rate on park services. Applying a standard clustering procedure would be wholly inappropriate in this case. Therefore, we hereby propose an unsupervised classification of mixed data, a specific procedure capable of processing both numerical than categorical variables simultaneously, releasing truly precious information. In conclusion, our application therefore emphasizes how cluster analysis for mixed data can lead to discover particularly informative patterns, allowing to lay the groundwork for an accurate customers profiling, starting point for a detailed marketing analysis.

4

Montesinos López, Osval Antonio, Abelardo Montesinos López, and Jose Crossa. "Artificial Neural Networks and Deep Learning for Genomic Prediction of Binary, Ordinal, and Mixed Outcomes." In Multivariate Statistical Machine Learning Methods for Genomic Prediction, 477–532. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-89010-0_12.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractIn this chapter, we provide the main elements for implementing deep neural networks in Keras for binary, categorical, and mixed outcomes under feedforward networks as well as the main practical issues involved in implementing deep learning models with binary response variables. The same practical issues are provided for implementing deep neural networks with categorical and count traits under a univariate framework. We follow with a detailed assessment of information for implementing multivariate deep learning models for continuous, binary, categorical, count, and mixed outcomes. In all the examples given, the data came from plant breeding experiments including genomic data. The training process for binary, ordinal, count, and multivariate outcomes is similar to fitting DNN models with univariate continuous outcomes, since once we have the data to be trained, we need to (a) define the DNN model in Keras, (b) configure and compile the model, (c) fit the model, and finally, (d) evaluate the prediction performance in the testing set. In the next section, we provide illustrative examples of training DNN for binary outcomes in Keras R (Chollet and Allaire, Deep learning with R. Manning Publications, Manning Early Access Program (MEA), 2017; Allaire and Chollet, Keras: R interface to Keras’, 2019).

5

Montesinos López, Osval Antonio, Abelardo Montesinos López, and Jose Crossa. "Reproducing Kernel Hilbert Spaces Regression and Classification Methods." In Multivariate Statistical Machine Learning Methods for Genomic Prediction, 251–336. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-89010-0_8.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractThe fundamentals for Reproducing Kernel Hilbert Spaces (RKHS) regression methods are described in this chapter. We first point out the virtues of RKHS regression methods and why these methods are gaining a lot of acceptance in statistical machine learning. Key elements for the construction of RKHS regression methods are provided, the kernel trick is explained in some detail, and the main kernel functions for building kernels are provided. This chapter explains some loss functions under a fixed model framework with examples of Gaussian, binary, and categorical response variables. We illustrate the use of mixed models with kernels by providing examples for continuous response variables. Practical issues for tuning the kernels are illustrated. We expand the RKHS regression methods under a Bayesian framework with practical examples applied to continuous and categorical response variables and by including in the predictor the main effects of environments, genotypes, and the genotype ×environment interaction. We show examples of multi-trait RKHS regression methods for continuous response variables. Finally, some practical issues of kernel compression methods are provided which are important for reducing the computation cost of implementing conventional RKHS methods.

6

Montesinos López, Osval Antonio, Abelardo Montesinos López, and Jose Crossa. "Random Forest for Genomic Prediction." In Multivariate Statistical Machine Learning Methods for Genomic Prediction, 633–81. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-89010-0_15.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractWe give a detailed description of random forest and exemplify its use with data from plant breeding and genomic selection. The motivations for using random forest in genomic-enabled prediction are explained. Then we describe the process of building decision trees, which are a key component for building random forest models. We give (1) the random forest algorithm, (2) the main hyperparameters that need to be tuned, and (3) different splitting rules that are key for implementing random forest models for continuous, binary, categorical, and count response variables. In addition, many examples are provided for training random forest models with different types of response variables with plant breeding data. The random forest algorithm for multivariate outcomes is provided and its most popular splitting rules are also explained. In this case, some examples are provided for illustrating its implementation even with mixed outcomes (continuous, binary, and categorical). Final comments about the pros and cons of random forest are provided.

7

Salinas Ruíz, Josafhat, Osval Antonio Montesinos López, Gabriela Hernández Ramírez, and Jose Crossa Hiriart. "Generalized Linear Mixed Models for Non-normal Responses." In Generalized Linear Mixed Models with Applications in Agriculture and Biology, 113–27. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-32800-8_4.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractGeneralized linear mixed models (GLMMs) have been recognized as one of the major methodological developments in recent years, which is evidenced by the increased use of such sophisticated statistical tools with broader applicability and flexibility. This family of models can be applied to a wide range of different data types (continuous, categorical (nominal or ordinal), percentages, and counts), and each is appropriate for a specific type of data. This modern methodology allows data to be described through a distribution of the exponential family that best fits the response variable. These complex models were not computationally possible up until recently when advances in statistical software have allowed users to apply GLMMs (Zuur et al. 2009; Stroup 2012; Zuur et al. 2013). Researchers in fields other than statistical science are also interested in modeling the structure of data. For example, in the social sciences there have been applications in the field of education when several tests are applied to students; in longitudinal personality studies when the occurrence of an emotion is repeatedly observed over time over a set of people; and in surveys to investigate the political preference of a population, among others.

8

Inchausti, Pablo. "Accounting for Structure in Mixed/Hierarchical Models." In Statistical Modeling With R, 327–72. Oxford University PressOxford, 2022. http://dx.doi.org/10.1093/oso/9780192859013.003.0014.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract This chapter defines the fixed (population-level) and random (group-level) effects of categorical explanatory variables in the frequentist and Bayesian frameworks. These effects allow models to account for structure in the data induced by the experimental or survey design, or by the sampling methods. The chapter carefully discusses the interpretation of estimated random (group-level) intercepts and slopes in linear mixed models, and the ambiguities, problems, and controversies related to the definition and interpretation of random effects in the frequentist framework. It illustrates the “shrinkage effect” and its importance in mixed/hierarchical models, and covers the controversies about model selection and the statistical significance of fixed effects in frequentist mixed models, including the use of the Satterwhaite and Kenward–Roger corrections, and the parametric bootstrap. All statistical models are fitted in both the frequentist and Bayesian frameworks.

9

Harding, Courtenay M. "Tough Questions From the Chief of Medical Biostatistics." In Recovery from Schizophrenia, 110–22. Oxford University PressNew York, 2024. http://dx.doi.org/10.1093/oso/9780195380095.003.0009.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract The author describes the further development of the Vermont Longitudinal Study, including the acquisition of Takamaru Ashikaga, the chief of medical biostatistics at the University of Vermont. The author details how, although she had taken an undergraduate course in statistics, she still had a lot to learn about the language of statistics, dealing with mixed-method design, using multiple strategies with punch cards, and reading computer printouts. Ashikaga made certain that the study’s data were double coded to reduce errors and then direct coded. The study’s research team also used code books and a master categorical clumping of variables across instrument batteries to analyze get a handle on the 2600 variables in their data. The author goes on to describe her taking more graduate courses in statistics.

10

Kuri-Morales, Angel Fernando. "Minimum Database Determination and Preprocessing for Machine Learning." In Advances in Web Technologies and Engineering, 94–131. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-7268-8.ch005.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The exploitation of large databases implies the investment of expensive resources both in terms of the storage and processing time. The correct assessment of the data implies that pre-processing steps be taken before its analysis. The transformation of categorical data by adequately encoding every instance of categorical variables is needed. Encoding must be implemented that preserves the actual patterns while avoiding the introduction of non-existing ones. The authors discuss CESAMO, an algorithm which allows us to statistically identify the pattern preserving codes. The resulting database is more economical and may encompass mixed databases. Thus, they obtain an optimal transformed representation that is considerably more compact without impairing its informational content. For the equivalence of the original (FD) and reduced data set (RD), they apply an algorithm that relies on a multivariate regression algorithm (AA). Through the combined application of CESAMO and AA, the equivalent behavior of both FD and RD may be guaranteed with a high degree of statistical certainty.

Тези доповідей конференцій з теми "Mixed categorical variables":

1

Johansson, Sara, Mikael Jern, and Jimmy Johansson. "Interactive Quantification of Categorical Variables in Mixed Data Sets." In 2008 12th International Conference Information Visualisation (IV). IEEE, 2008. http://dx.doi.org/10.1109/iv.2008.33.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Uglickich, Evženie, Ivan Nagy, and Tetiana Reznychenko. "Count Predictive Model with Mixed Categorical and Count Explanatory Variables." In 2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). IEEE, 2023. http://dx.doi.org/10.1109/idaacs58523.2023.10348640.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Saves, Paul, Nathalie Bartoli, Youssef Diouane, Thierry Lefebvre, Joseph Morlier, Christophe David, Eric Nguyen Van, and Sébastien Defoort. "Multidisciplinary design optimization with mixed categorical variables for aircraft design." In AIAA SCITECH 2022 Forum. Reston, Virginia: American Institute of Aeronautics and Astronautics, 2022. http://dx.doi.org/10.2514/6.2022-0082.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Saves, Paul, Nathalie Bartoli, Youssef Diouane, Thierry Lefebvre, Joseph Morlier, Christophe David, Eric Nguyen Van, and Sébastien Defoort. "Correction: Multidisciplinary design optimization with mixed categorical variables for aircraft design." In AIAA SCITECH 2022 Forum. Reston, Virginia: American Institute of Aeronautics and Astronautics, 2022. http://dx.doi.org/10.2514/6.2022-0082.c1.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Saves, Paul, Youssef Diouane, Nathalie Bartoli, Thierry Lefebvre, and Joseph Morlier. "A general square exponential kernel to handle mixed-categorical variables for Gaussian process." In AIAA AVIATION 2022 Forum. Reston, Virginia: American Institute of Aeronautics and Astronautics, 2022. http://dx.doi.org/10.2514/6.2022-3870.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Comlek, Yigitcan, Liwei Wang, and Wei Chen. "Mixed-Variable Global Sensitivity Analysis With Applications to Data-Driven Combinatorial Materials Design." In ASME 2023 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2023. http://dx.doi.org/10.1115/detc2023-110756.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract Global Sensitivity Analysis (GSA) is the study of the influence of any given inputs on the outputs of a model. In the context of engineering design, GSA has been widely used to understand both individual and collective contributions of design variables on the design objectives. So far, global sensitivity studies have often been limited to design spaces with only quantitative (numerical) design variables. However, many engineering systems also contain, if not only, qualitative (categorical) design variables in addition to quantitative design variables. In this paper, we integrate the novel Latent Variable Gaussian Process (LVGP) with Sobol’ analysis to develop the first metamodel-based mixed-variable GSA method. Through two analytical case studies, we first validate and demonstrate the effectiveness of our proposed method for mixed-variable problems. Furthermore, while the new metamodel-based mixed-variable GSA method can benefit various engineering design applications, we employ our method with multi-objective Bayesian optimization (BO) to accelerate the Pareto front design exploration in many-level combinatorial design spaces. Specifically, we implement a sensitivity-aware design framework for metal-organic framework (MOF) materials that are constructed only from qualitative design variables and show the benefits of our method for expediting the exploration of novel MOF candidates from a many-level large combinatorial design space.

7

Uyar, A., A. Bener, H. N. Ciray, and M. Bahceci. "A frequency based encoding technique for transformation of categorical variables in mixed IVF dataset." In 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2009. http://dx.doi.org/10.1109/iembs.2009.5334548.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Honda, Katsuhiro, Ryo Uesugi, Hidetomo Ichihashi, and Akira Notsu. "Linear Fuzzy Clustering of Mixed Databases Based on Cluster-wise Optimal Scaling of Categorical Variables." In 2007 IEEE International Fuzzy Systems Conference. IEEE, 2007. http://dx.doi.org/10.1109/fuzzy.2007.4295398.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Pathak, Soumi. "Changing trends in coagulation profile of 30 patients undergoing CRS with HIPEC in the peri-operative period." In 16th Annual International Conference RGCON. Thieme Medical and Scientific Publishers Private Ltd., 2016. http://dx.doi.org/10.1055/s-0039-1685386.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Background: With advent of surgical advancements like HIPEC several unstudied pathophysiological aspects need to be evaluated. We studied the trends in coagulation profile in patients undergoing CRS with HIPEC in the peri-operative period, utilizing Thromboelastography (TEG) in comparison with standard coagulation tests. The utility of TEG as a guide for transfusion of blood products was also evaluated. Materials and Methods: It was a Prospective observational Cohort study which included 30 consecutive patients undergoing CRS with HIPEC at RGCI in 2015. Methodology: Preoperatively standard coagulation tests were done as a baseline. Intra-operative arterial blood samples were collected for ABG, PT, APTT, and TEG at following time points: before starting of HIPEC, after completion of HIPEC and on 1 and 2 postoperative days. Statistical analysis was done using Chi-square test and unpaired t-test for categorical and continuous variables. Pearson’s correlation coefficient was calculated for analysing the correlation between the variables. P < 0.05 was considered statistically significant. Results: A strong correlation was observed between PT & R values of TEG. Similar correlation was also observed between the α angle, MA of TEG and platelet count throughout the peri-operative period. Immediately post HIPEC, we observe value of APPT decreases while the other parameters of coagulation profile showed a rising trend. R value showed rising trend after CRS, a dip after HIPEC followed by a rising trend on first post operative day which normalizes only after second post operative day. It gives a mixed picture of both hypo and hyper coagulable state. α angle, MA rise immediately after HIPEC and continue to rise till the second postoperative day. There was no requirement of transfusion of blood and blood products as guided by the TEG findings and no clinical evidence of any bleeding or thromboembolic episode occurred. Conclusion: To conclude, our study demonstrated TEG to be a useful and comprehensive tool to assess coagulopathy and accordingly guide blood product transfusion in patients undergoing CRS with HIPEC.

10

Xu, Hongyi, Ching-Hung Chuang, and Ren-Jye Yang. "Mixed-Variable Metamodeling Methods for Designing Multi-Material Structures." In ASME 2016 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2016. http://dx.doi.org/10.1115/detc2016-59176.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

To establish metamodels for the multi-material structure design problems, the material selection of each component is considered as a categorical design variable. One challenging task is to establish an accurate mixed-variable metamodel. It is critical to reduce the prediction error of the mixed-variable metamodel in order to achieve a feasible design with superior performance in the metamodel-based optimization. This paper investigates two different strategies of mixed-variable metamodeling: “feature separating” strategy and “all-in-one” strategy. A supervised learning-aided method is proposed to improve the “feature separating” metamodels. The proposed method is compared with several existing mixed-variable metamodeling methods on three engineering benchmark problems to understand their relative merits. These methods include Neural Network (NN) regression, Classification and Regression Tree (CART) and Gaussian Process (GP). A new Polynomial Coefficient Metric is developed to quantify the adequacy of training data. This study provides insight and guidance for establishing proper metamodels on multi-material structural design problems.